Pair Your Compilers at the ABI Café

14 days ago (faultlore.com)

I can definitely feel the pain of trying to work out ABI mismatch concerns. It doesn't help that it often isn't clear from the output what some of the underlying assumptions are--expected stack alignment, for example, or structs being broken up into registers or not.

It would be nice if compilers could output some sort of metadata that basically says "ah yes, here's a struct, it requires this alignment, fields are at these offsets and have these sizes, and are present in these cases" (the latter option being able to support discriminated unions) or "function call, parameters are here, here, and here." You'd think this is what DWARF itself provides, but if you play around with DWARF for a bit, you discover that it actually lacks a lot of the low-level ABI details you want to uncover; instead, it's more of a format that's meant to be generic enough to convey to the debugger the AST of the source program along with some hint of how to map the binary code to that AST--you can't really write a language-agnostic DWARF-based debugger.

  • Some Common Lisp FFIs have opted to coax this information out of the compiler. https://github.com/rpav/c2ffi is a C++ tool that links to libclang-cpp and literally outputs JSON with sizes and alignments. (It is then used by https://github.com/rpav/cl-autowrap to autogenerate a Lisp wrapper.) The older CFFI Groveller [1] works by generating C code which is compiled by the system C compiler (e.g. GCC or Clang) and, when executed, prints Lisp code that contains resolved values of constants, sizes, alignments, etc.

    [1] https://cffi.common-lisp.dev/manual/html_node/The-Groveller....

  • What you are asking for sounds quite a bit like what rustc does.

    Rustc (outside `extern "c"`) offers no guarantees on the ordering of the fields, however, it guarantees that every instance of struct A will have the same ordering during that particular compilation. This allows rustc to compile external crates (as long as no monomorphization is needed) in a consistent manner across all crates that depend on that.

    • Most of the ABI issues arise when you start to mix and match shared libaries produced by different compilers, or even the libraries produced by the different versions of the same compiler.

      Rust has none of that, nor does support dynamic linking, so I fail to understand what is it that rustc can offer in that solution space. There is none.

      2 replies →

I think the right way to avoid this problem is to avoid using ABI at runtime or build time.

At runtime, it means - don't use shared libraries. At build time, it means - build every library from the source, don't use pre-built artifacts.

This sounds controversial... But it allows you to change compiler or compiler options at any time, and you don't have to bother. It also enables cross-compilation, reproducible builds, and portable binaries. You no longer have to ask developers to set up a complex build environment on a specific Linux distribution because it works everywhere.

I use this approach for ClickHouse.

  • This can work only if you own the entire codebase and have all external dependencies that you depend on statically link (compiled) within your product.

    I also very much prefer this way of handling dependencies but it's not a solution for all ABI problems since it also implies that you will need to statically link (compile) against all the transient dependencies. These are including at very minimum libc++ or libstdc++. And with this requirement in place this already isn't possible for many of the codebases out there.

    And it also brings another issue at the table: X version of libc++/libstdc++ depends on Y version of libc.

    Since you generally cannot statically link against the libc, and you don't own it since it's part of the OS, this becomes a hairy problem. You really need to make sure that your code works across different versions and thereof combinations of libc++/libstdc++/libc.

    And then there's ... a bunch of other different platforms which aren't Linux.

    • Glibc is obstructive to static linking but musl is not. That gives you a binary that relies on the Linux syscall interface and nothing else. I believe bsd's libc statically links without problems as well.

      Libc++ is set up for static linking out of the box (if you manage to find or guess the many cmake flags).

      OSX and Windows insist on libc iirc but they're closed systems anyway so controlling your dependency graph is unavailable.

      3 replies →

  • Even then, you still need ABI consistency between compilers if you want to link together codebases written in different languages (e.g. C and Rust).

    In practice this almost always 'just works' because most cross-language calls simply don't use the kinds of complicated types discussed in the blog post. They tend to stick to simple integer and pointer types, where ABI consistency is usually a given.

    Though you can still get into trouble when passing function pointers, especially when combined with some modern control-flow integrity systems.

    • >Even then, you still need ABI consistency between compilers if you want to link together codebases written in different languages (e.g. C and Rust).

      Let's talk over http, queue or other IPC-ish way

      3 replies →

  • Okay, how do you propose to talk to your kernel then?

    • Who wants a kernel? Distribute a bootable unikernel image that can be talked to via gRPC or something.

      Obviously there are plenty of things you can't build that way (e.g. drivers), but for a server application that's intended to be accessed over the network anyway, like Clickhouse, I'm increasingly thinking that's the way to go.

  • Tell me you never worked in a big codebase without telling me you never worked in a big codebase.

There is a specified common C++ ABI that gcc, clang, Intel's proprietary compiler, and others use. It was originally developed for the Itanium processor but is now used by gcc and clang for everything. See

https://itanium-cxx-abi.github.io/cxx-abi/abi.html

Unfortunately this ABI didn't specify how __int128 (and other nonstandard types) are to be passed.

I struggled with this many times and at the end of the day threw down the towel and just wrapped everything in plain C exports. That's the only way I know to get ABI compatibility across different compilers/toolsets/versions. COM-like constructs come as a close second.

It's an unfortunate state.

Also function pointers, errors & exception-handling, async/channels/thread-local's, go stacks, swift @objc, @cdecl and cpp inter-op, FFI dialects...

It's not really pain anymore; it's a kind of hilarity

If I understand correctly there's also an ABI problem for synchronization rules.

Within a compiler, it actually doesn't matter whether we think about a problem as A mustn't happen after B, or as B mustn't happen before A. But expressing this across an ABI we have to be careful that we don't have buck passing. Suppose Language #1 thinks of it the first way, and Language #2 the second way, now if Language #1 is responsible for B while Language #2 is responsible for A, each may believe the other will have taken care of ordering and no synchronization is actually implemented.

Overall, "ABI" turns out to mean something like "Every assumption you've made which can be detected by other software, including assumptions you didn't realise you had". Discovering all your assumptions is hard, accepting that other people assumed different and they aren't just wrong is also surprisingly hard.

  • Yes, if you're brave enough to have fence position as part of the calling convention. I think you're safe if ordering is expressed within a given function, or by sequence of calls to functions with the same ideas of fences.