Comment by wg0

9 hours ago

Otherwise is a decent language but what makes it difficult is the borrow semantics and lifetimes. Lifetimes are more complicated to get your head around.

But then there's this Arc, Ref, Pinning and what not - how deep is that rabbit hole?

If you’re writing C and don’t track ownership of values, you’re in a world of hurt. Rust makes you do from day one what you could do in C but unless you have years of experience you think it isn’t necessary.

  • Okay, I think it is is more like Typescript. You hate it but one day you just write small JS program and convert it to Typescript to discover that static analysis alone had so many code paths revealed that would have resulted in uncaught errors and then you always feel very uncomfortable writing plain Javascript.

    But what about tools like valgrind in context of C?

    • Valgrind can only tell you about issues that your testcases exercise. It doesn't provide the same guarantees as static checking of memory safety invariants. But, if you're really concerned (especially about unsafe code), belt-and-bracers is a good strategy, and valgrind will work with rust binaries as well. Rust also has a tool called MIRI which can similarly flag up issues in testcases (it's effectively an interpreter for the intermediate representation in the compiler, and it can detect undefined behaviour even if the compiled assembly would happen to look OK. Still has the same limitation of needing extensive testcases though)

    • You probably should run your rust programs through valgrind regardless. Rust is safer than C, but any unsafe code drops you to approximately C level of safety and any C FFI calls are obviously outside of rust's control or responsibility.

    • Valgrind is great, especially if you write extensive tests and you actually run them through it regularly. And even then, it does not prove the absence of any kind of bugs. Safe rust has strong guarantees.

  • It was true until LLMs arrive. Feature compilers + IDEs can be integrated with LLMs to help programmers.

    Rust was a great idea, before LLMs, but I don't see the motivation for Rust when LLMs can be the solution initial for C/C++ 'problems'.

    • Relying on LLMs to code for you in no way solves the safety problem of C/C++ and probably worsens it.

    • On the contrary LLMs make using safe but constraining languages easier - you can just ask it how to do what you want in Rust, perhaps even by asking it to translate C-ish pseudocode.

Context: I'm writing a novel kernel in Rust.

Lifetimes aren't bad, the learning curve is admittedly a bit high. Post-v1 rust significantly reduced the number of places you need them and a recent update allows you to elide them even more if memory serves.

Arc isn't any different than other languages, not sure what you're referring to by ref but a reference is just a pointer with added semantic guarantees, and Pin isn't necessary unless you're doing async (not a single Pin shows up in the kernel thus far and I can't imagine why I'd have one going forward).

Rust lifetime is just a label for a region of memory with various data, which is discarded at the end of its life time. When compiler enters a function, it creates a memory block to hold data of all variables in the function, and then discards this block at the exit from the function, so these variables are valid for life time of the function call only.

I don’t entirely agree, you can get used to the borrow checker relatively quickly and you mostly stop thinking about it.

What tends to make Rust complex is advanced use of traits, generics, iterators, closures, wrapper types, async, error types… You start getting these massive semi-autogenerated nested types, the syntax sugar starts generating complex logic for you in the background that you cannot see but have to keep in mind.

It’s tempting to use the advanced type system to encode and enforce complex API semantics, using Rust almost like a formal verifier / theorem prover. But things can easily become overwhelming down that rabbit hole.

I always feel Arc is the admission that the borrow checker with different/overlapping lifetimes is too difficult, despite what many Rust developers - who liberally use Arc - claim.

  • Lifetime tracking and ownership are very difficult. That's why languages like C and C++ don't do it. It's also why those languages needs tons of extra validation steps and analysis tools to prevent bugs.

    Arc is nothing more than reference counting. C++ can do that too, and I'm sure there are C libraries for it. That's not an admission of anything, it's actually solving the problem rather than ignoring it and hoping it doesn't crash your program in fun and unexpected ways.

    Using Arc also comes with a performance hit because validation needs to be done at runtime. You can go back to the faster C/C++ style data exchange by wrapping your code in unsafe {} blocks, though, but the risks of memory corruption, concurrent access, and using deallocated memory are on you if you do it, and those are generally the whole reason people pick Rust over C++ in the first place.

    • Looking at the code, it consists of long chains of get().unwrap().to_mut().unwrap().get() noise. Looks like coping with library design than ownership tacking. Also why Result<Option<T>>? Isn't Result already Option by itself? I guess that's why you need get().unwrap().to_mut() to get a value from Result<Option<T>> from an average function call?

  • It's not that the borrow checker is too difficult, it's that it's too limiting.

    The _static_ borrow checker can only check what is _statically_ verifiable, which is but a subset of valid programs. There are few things more frustrating than doing something you know is correct, but that you cannot express in your language.

    • For kernels (and I suspect database engines might be added to the list, since they seem to have similar requirements to be both scalable and deal with massive amounts of shared state, but I'm not overly familiar with them) is where it gets particularly difficult.

      Several kernels for example use type-stable memory, memory that is guaranteed to only hold objects of a particular type, though perhaps only providing that guarantee for as long as you hold an RCU read-lock (this is the case in Linux with SLAB_TYPESAFE_BY_RCU). It is possible in some cases to be able to safely deal with references to objects where the "lifetime" of the referent has ended, but where by dint of it being guaranteed to be the same type of object, you can still do what you want to do.

      This comes in handy when you have a problem that commonly appears in kernels where you need to invert a typical lock ordering (a classic case is that the page fault codepath might want to lock, say, VM object then page queue, but the page-replacement codepath will want to lock page-queue then VM object.)

      Unfortunately it's hard to think of how the preconditions for these tricks could be formally expressed.

  • If tracking lifetimes is simple 90% of the time and complex 10% of the time, maybe a tool that lets you have them automatically managed (with some runtime overhead) that 10% of the time is the right way forward.

  • It's not just difficult, sometimes it's impossible to statically know a lifetime of a value, so you must dynamically track it. Arc is one of such tools.