← Back to context

Comment by alexvitkov

3 months ago

If you want to interop well with Rust code, it feels to me like your language has to inherit so many Rust semantics, that I'm questioning myself why I would use it over Rust.

If you're making a new language, just have good interop with C. Most libraries worth using are written in C. Calling into C is trivial* and enforces almost no limitations on what you can do language-design wise.

* trivial, with the somewhat sizable asterisk that you have to rewrite the header files in your language.

I wish Rust would standardize their ABI already. I started a project to call Rust from Common Lisp, but haven't got very far. It's a lot of work, and they can break compatibility at any time.

If they really want to replace C and C++ then they really need to support being called from third party languages.

I've been looking into this, and I suspect that one actually needs surprisingly little to interoperate safely with Rust.

TL;DR: The lowest common denominator between Rust and any other memory-safe language is a borrow-less affine type.

The key insight is that Rust is actually several different mechanisms stacked on top of each other.

To illustrate, imagine a program in a Rust-like language.

Now, refactor it so you don't have any & references, only &mut. It actually works, if you're willing to refactor a bit: you'll be storing a lot of things in collections and referring to them by index, and cloning even more, but nothing too bad.

Now, go even further and refactor the program to not have any &mut either. This requires some acrobatics: you'll be temporarily removing things from those collections and moving things into and out of functions like in [2], but it's still possible.

You're left with something I refer to as "borrowless affine style" in [1] or "move-only programming" in [0].

I believe that's the bare minimum needed to interoperate with Rust in a memory safe way: unreference-able moveable types.

The big question then becomes: if our language has only these moveable types, and we want to call a Rust function that accepts a reference, what then?

I'd say: make the language move the type in as an argument, take a temporary reference just for Rust, and then move-return the type back to the caller. The rest of our language doesn't need to know about borrowing, it's just a private implementation detail of the FFI.

These weird moveable types are, of course, extremely unergonomic, but they serves as a foundation. A language could use these only for Rust interop, or it could go further: it could add other mechanisms on top such as & (hard), or &mut (easy), or both (like Rust), or a lot of cloning (like [3]), or generational references (like Vale), or some sort of RefCell/Rc blend, or linear types + garbage collection (like Haskell) and so on.

(This is actually the topic of the next post, you can tell I've been thinking about it a lot, lol)

[0] "Move-only programming" in https://verdagon.dev/grimoire/grimoire#the-list

[1] "Borrowless affine style" in https://verdagon.dev/blog/vale-memory-safe-cpp

[2] https://verdagon.dev/blog/linear-types-borrowing

[3] https://web.archive.org/web/20230617045201/https://degaz.io/...

  • Have you taken a look at the paper "Foreign Function Typing: Semantic Type Soundness for FFIs" [0]?

    > We wish to establish type soundness in such a setting, where there are two languages making foreign calls to one another. In particular, we want a notion of convertibility, that a type τA from language A is convertible to a type τB from language B, which we will write τA ∼ τB , such that conversions between these types maintain type soundness (dynamically or statically) of the overall system

    > ...the languages will be translated to a common target. We do this using a realizability model, that is, by up a logical relation indexed by source types but inhabited by target terms that behave as dictated by source types. The conversions τA ∼ τB that should be allowed, are the ones implemented by target-level translations that convert terms that semantically behave like τA to terms that semantically behave like τB (and vice versa)

    I've toyed with this approach to formalize the FFI for TypeScript and Pyret and it seemed to work pretty well. It might get messier with Rust because you would probably need to integrate the Stacked/Tree Borrows model into the common target.

    But if you can restrict the exposed FFI as a Rust-sublanguage without borrows, maybe you wouldn't need to.

    [0] (PDF Warning): https://wgt20.irif.fr/wgt20-final23-acmpaginated.pdf

  • Thanks for the write-up. My biggest fear is not references, overloads or memory management, but rather just the layout of their structures.

    We have this:

        sizeof(String) == 24
        sizeof(Option<String>) == 24
    

    Which is cool. But Option<T> is defined like this:

        enum Option<T> {
           Some(T),
           None,
        }
    

    I didn't find any "template specialization" tricks that you would see in C++, as far as I can see the compiler figures out some trick to squeeze Option<String> into 24 bytes. Whatever those tricks are, unless rustc has an option to export the layout of a type, you will need to implement yourself.

    • You don’t need to determine the internal representation as long as you’re dealing with opaque types and invoking rust functions on it.

      As for the tricks used to make both 24 bytes, it’s NonNull within String that Option then detects and knows it can represent transparently without any enum tags. For what it’s worth you can do similar tricks in c++ using zero-sized types and tags to declare nullable state (in fact std::option already knows to do this for pointer types if I recall correctly)

    • Yeah currently "niche optimization" is performed when the compiler can infer that some values of the structure are illegal.

      This can be currently done when a type declares the range of an integer to not be complete with the

      rustc_layout_scalar_valid_range_start or _end attribute (requires #![feature(rustc_attrs)])

      In your example it works for String, because String contains a Vec<U8> which inside contains a capacity field of type struct Cap(usize) but the usize is effectively constrained to contain values from 0..=max_isize

      The only way for you to know that is to effectively be the rustc compiler or be able to consume it's output