Show HN: SimBricks – Modular Full-System Simulation for HW-SW Systems

11 days ago (simbricks.github.io)

Hi HN,

we are building SimBricks, an open-source simulation framework for heterogeneous systems, especially with custom hardware. SimBricks modularly combines existing simulators for machines, networks, and hardware, allowing you to build, test, and evaluate intricate complete systems in a virtual environment. Head over to the SimBricks website (https://simbricks.github.io/, also has a quick demo video) to learn more. We have pre-built docker images, and you can even immediately play around on codespaces.

Concrete use-cases: - Evaluate HW accelerators, from early design with simple behavioral models, to simulating complete Verilog implementations, both as part of complete systems with many instances of the accelerator and machines running full OS and real applications (we did a university course on this with SimBricks). - Test network protocols, topologies, and communication stacks for real workloads in potentially large systems (we ran up to 1000 hosts so far). - Rapid RTL prototyping for FPGAs, no waiting for synthesis or fiddling with timing initially (we simulate the complete unmodified RTL for the Corundum Open-source NIC with their unmodified PCIe drivers).

SimBricks originally started out as an internal research tool, for helping us build and evaluate our research ideas on network protocol offload, but has since grown into a separate open-source project.

Would be great if you give it a shot and let us know what you think!

I'm interested in this concept, but how does this work?

With FPGA or ASIC designs at scale, it can take hours to run gate-level simulations spanning just a few milliseconds of operation. How can this be integrated into a networked system-of-systems as shown in the demo? Or are the simulations shown running at a lower level of fidelity?

  • First off, there is no magic here. SimBricks simulations (when synchronization is enabled) are going to run as slow as the fastest piece and scale up through parallelism (so if you have N instances of an FPGA/ASIC design, you need N times the compute). There are, of course, the usual tricks like fast forwarding/checkpointing etc. so you don't have to spend forever just booting your Linux because the HW design is sitting there going at a snails pace.

    So nothing stops you from including a gate-level simulation, other simulators will just slow down accordingly (with very slow simulators the synchronization overheads are generally negligible). That said, GLS might not be the most common use-case here. Would a full-system simulation with GLS actually gets you additional insights relative to just simulating the RTL? (genuine question)

    For our internal use-cases so far, we have primarily done TLM and RTL simulations. Depending on the level of fidelity (e.g. fast functional with Qemu, or slow detailed OoO CPU in gem5) here we are talking seconds to hours for simulating a few seconds.

    One interesting bit that the modularity gets you, is that you can mix fidelities of components. So if you do want to test a GLS component as part of a large system with multiple instances of your design, you could consider only doing GLS for one instance and using just the RTL simulation or even a TLM for the others. This does not speed up the simulation, but drastically reduces the compute needed.

    But curious on what your take is for when you think the full-system perspective would be useful.

Looks neat! I don't have much use for this right now but I did some work with gem5 years ago and this seems to solve some real pain points. Congrats on shipping!

From early design with simple behavioral models to simulating complete Verilog implementations. Useful in systems with multiple instances of the accelerator and machines running full OS and real applications. https://www-netbenefits.com