Ointers: A library for representing pointers where bits have been stolen (2021)

12 days ago (github.com)

109 comments

fanf2

Stealing bits from pointers makes sense sometimes, but in most cases an easier and less error-prone way to increase memory efficiency is to just use indices or relative pointers. Instead of shaving individual bits of a pointer take advantage of the fact that most pointers in a program are almost exactly alike.

Instead of storing a 64 bit prev and next pointer for an item in a linked list just store e.g. two 16 bit offsets. Then the next_ptr = (this_ptr & base_mask) + offset * alignment.

Most of the time wasting memory by storing full 64bit pointers is no big deal. But when it is, you can go a lot further than just shaving off a few bits.

If you want to be extra clever you can encode values in the base address of your pointers by taking advantage of how virtual memory is allocated. Every process has its own virtual memory pages and you can request memory pages at specific address ranges. For instance if you align all your memory at a 32gb boundary then you can use 32bit pointers + a base pointer without even having to mask. You can also use virtual memory allocation tricks to convey information about alignment, typing, whether a pointer is allowed to be shared between threads and so on. And again, without having to butcher the pointer itself.

jsheard 12 days ago
Considering how common it is to use indices in lieu of pointers in Rust, it would be nice if it had native NonMax integer types similar to the NonZero types it already has. NonZero gives you the niche-filling optimization (e.g. Option<NonZero> and NonZero are the same size) but with indices you want zero to be a valid value, so it would be natural to use the maximum as the illegal value instead. You can kind of do it by offsetting the index by 1 or inverting the bits then putting it in a NonZero but that's a small extra runtime cost on every use.
- joseluis 12 days ago
  
  You can do a handy struct wrapper over a private Nonzero that xors the given value with the prohibited value (max in this case) at the new constructor. And like so for the get method. Xoring is very cheap. That's my favorite way of storing indices/links for nodes, since you can wrap them in Option for free.
  
  4 replies →
dkjaudyeqooe 12 days ago
That's really not the application for this, it's tags, which are endlessly useful or characterizing the pointer. For instance you can use the MSBs to establish the type of whats pointed to, or for reference counting. I use the LSB to indicate if the value is a pointer or an unsigned integer, whereby you can store both an (integer) key to a record in a database, or a pointer to the record data in memory. You could use another bit in the pointer to indicate that the record is dirty.
Using unused bits in pointers can make you code more efficient and simpler.
- westurner 11 days ago
  
  Can unused bits be used for signed pointers or CRCs?
  
  2 replies →
zozbot234 12 days ago
This requires using arenas extensively in your program or perhaps playing tricks with virtual memory, because your system allocator will give you arbitrary pointers by default and you'll often want to point to an object from a different allocation.
- gizmo 12 days ago
  
  Right. Which is why I favor doing your own memory management entirely (which has some huge advantages) or not worrying about memory at all and trusting a garbage collector. I don't think there are many situations left where memory management is too important to leave to a gc but not important enough to do from scratch.
  
  1 reply →
gpderetta 12 days ago

Definitely indices are a great option, but then you need a base pointer and a way to allocate off of it. That can add significant complexity. So it is all a tradeoff.

pavlov 12 days ago

Technically it’s an ointer on PowerPC and SPARC, but a pointe everywhere else.

sojuz151 12 days ago

This look like something that debuggers would hate, because following pointers would be broken

jxf 12 days ago
I was just thinking this. You'd either need to special-case your debuggers or be able to use something that provided bespoke hooks for pointer-following.
- gpderetta 11 days ago
  
  Wrap it in a class and teach the debugger to interpret it. It might be possible with gdb and python pretty-printers for example.

malkia 11 days ago

I've posted about MGS PS1 port we did to PC and how some pointers to the C4 bomb planted posted meant it was on the wall or on the ground - here ->

https://news.ycombinator.com/item?id=9739731

to quote me:

- The game used a tricky pointer hack, basically on the PSX accessing a pointer with the highest-24-bit set means read it from the CPU cache, otherwise not (or maybe the other way around). This was used for example to indicate whether the C4 bomb was planted on the ground, or on the wall instead of keeping a booblean/bit flag for it. Possibly was used for some more things. Since it was 24-bit, that meant 16mb.

zozbot234 12 days ago

Are there ways on Linux to make sure that all user-visible pointers in a process are allocated within some subset of the overall address space? For instance, imagine writing a 64-bit program such that all userspace pointers are allocated within the first 32 bits. This would allow for representing them in memory using four bytes, as is commonly done on 32-bit architectures--without adding more extensive or ad-hoc changes such as a new "x32" binary format. Except that you could also limit to 24-bit (giving 16M of address space), 16-bit (for a "tiny" code model) or anything else. Of course, this would require some amount of support in the linker/loader, userspace libraries and the kernel itself. But the effort might be worthwhile if it can allow for replacing more ad-hoc approaches.

ksherlock 12 days ago

Linux mmap (since 2.6) has a MAP_32BIT flag, but only for x86-64:
Put the mapping into the first 2 Gigabytes of the process address space. This flag is supported only on x86-64, for 64-bit programs. It was added to allow thread stacks to be allocated somewhere in the first 2 GB of memory, so as to improve context-switch performance on some early 64-bit processors. Modern x86-64 processors no longer have this performance problem, so use of this flag is not required on those systems. The MAP_32BIT flag is ignored when MAP_FIXED is set.
fathyb 12 days ago

An allocator is free to only `mmap` addresses that are within a range. I think jemalloc could allow you to do that using custom arenas.
o11c 11 days ago

ASAN relies on kernel internal details, but this has broken a few times.
Realistically you should just mmap a huge aligned chunk in the first place, then allocate inside that.

willcipriano 12 days ago

I never had the problem of too many pointers, but if I did I'd try the "arena"[0] approach.

Preallocate a huge area of memory and use smaller indexes relative to the start instead. Provided a pointer is bigger than a int. You might be able to use this in that approach and keep partial pointers instead.

[0]https://www.rfleury.com/p/untangling-lifetimes-the-arena-all...

gumby 11 days ago

> What do you call a pointer with the high bits stolen? An ointer!

What is the name a pointer with the low bits stolen? Machines with strict word addressing will ignore the bottom two bits so you can safely store what you want there.

Not sure I've used, much less programmed a machine with that strict constraint in a while though. These days CPUs are all designed to implement the C abstract machine. At least the GPU folks are not subject to this constraint.

asb 12 days ago

I somehow hadn't come across this library, but have a whole blog post on the various ways people store data in pointers (and when+why it's safe) https://muxup.com/2023q4/storing-data-in-pointers

makz 11 days ago

Awesome, been looking for this information for ages already.

couchand 12 days ago

Am I misreading the bitmask code? It looks like (in addition to a few other ideas) it's using the old "stick a few extra bits in an aligned pointer", but it seems to be only manipulating high bits, whereas aligned pointer zeroes are low-order bits.

I'd suggest a heavier investment in testing infrastructure.

masklinn 12 days ago
64 bit architectures don't actually have 64 bit address spaces, both AMD64 and ARM64 have 48 bit address spaces by default (some CPUs have extensions you can enable to request larger address spaces e.g. LVA on ARM64 and 5LP / LA57 on AMD64 but that's opt-in on a per-process basis).
So while you have 3 bits available at the bottom of the pointer, there are 16 at the top. That's a lot more payload you can smuggle. There are even CPU extensions which tell the processor to ignore some of that (Linear Address Masking for Intel, Upper Address Ignore for AMD, and Top Byte Ignore for ARM).
- vlovich123 12 days ago
  
  Small correction. That's true for 4-level paging where virtual memory is capped at 256 TiB (LAM48). There's a 5-level extension that reduces the wasted space to 7 bits (LAM57) to allow for 128 PiB of virtual space.
  I have no idea what the purpose of the extension is that when I don't believe you can get that much secondary storage attached to a CPU unless you go to tape which is pointless for virtual mapping.
  
  5 replies →
Sharlin 12 days ago

The stored representation is packed such that all the stealable bits are contiguous. To get the original pointer value it’s unpacked first.
pixelesque 12 days ago
It looks like it can use both traditional "tagged pointer" alignment bits AND virtual address bits...
- couchand 12 days ago
  
  I see that in the readme but I don't see where that's handled in the bitmasking. It appears to be universally masking high-order bits here: https://github.com/irrustible/ointers/blob/1961f75bbb9818d72...
  
  6 replies →

zgs 12 days ago

Every time I've seen similar done in the past, it has come back to hurt the instigator.

masklinn 12 days ago
Apple's been using that for more than a decade: https://www.mikeash.com/pyblog/friday-qa-2012-07-27-lets-bui... http://www.sealiesoftware.com/blog/archive/2013/09/24/objc_e...
OCaml's also been using tagging to mark unboxed type pretty much from the start, nearly 30 years ago: https://dev.realworldocaml.org/runtime-memory-layout.html#:~....
- duskwuff 11 days ago
  
  And it's not like Apple isn't aware of the dangers, either; they used top-byte tagging on the 68000, and it took them a lot of time and effort to migrate away from that to support 16+ MB systems.
  
  1 reply →
jstimpfle 12 days ago
It's common for libraries to use the 2 lowest bits in tree structures e.g. RB tree. It's not a problem at all since the data structure is controlled by the library. Even if tree nodes are allocated/provided by the user, having a "4-byte aligned" requirement in the API contract isn't a problem at all -- in fact you need to work quite hard to allocate a structure with a pointer that isn't at least 4 byte aligned (i386), or 8 byte aligned (x64).
- a1369209993 12 days ago
  
  > in fact you need to work quite hard to allocate a structure with a pointer that isn't at least 4 byte aligned (i386), or 8 byte aligned (x64).
  Well, no, actually; it's:
  p = malloc(size+1)+1;
  It's just quite implausible that you'd do that by accident.
  
  6 replies →
db48x 12 days ago
Every lisp system that ever existed uses this technique, and it never hurt them any. Emacs or SBCL or Open Genera; they all work perfectly well.
- gpderetta 12 days ago
  
  And the modern take on this is NaN Boxing!
  
  2 replies →
- rjsw 12 days ago
  
  Maclisp on PDP-10 and Franz Lisp didn't, they used bibop type checking instead of tagged pointers.
  
  6 replies →
- lmm 12 days ago
  
  > Every lisp system that ever existed uses this technique, and it never hurt them any.
  Wasn't it part of the reason they ended up with poor "mechanical sympathy" on regular PCs, and got a bad performance reputation as a result?
  
  27 replies →
- _flux 12 days ago
  
  OCaml uses the lowest bit, which is nice because (aligned) pointers work as-is.
mbrubeck 12 days ago

This bitvec I wrote for Servo and Gecko uses this technique, and has been shipping in Firefox releases for over six years with no bugs found:
https://docs.rs/smallbitvec/
I'm pretty sure you can find several more examples in Firefox, as well as other major browsers.
junon 12 days ago

VM writers use this egregiously. I've not heard of it causing issues. It's not that complicated.
Lua, Lisps, many others.
titzer 12 days ago

All fast JavaScript VMs use some form of pointer tagging. V8 uses (compressed) tagged pointers and both JSC and Firefox used NaN-boxing.
WasmGC has a i31ref as part of its reference hierarchy, which is implemented with pointer tagging.
pixelesque 12 days ago

I've used tagged pointers with no issues at all over the years to "smuggle" / store values in...
coldtea 12 days ago
This "trick" has been used since the dawn of time in major platforms
- adql 12 days ago
  
  I dunno, pointers were kinda small on 8/16 bit platforms
  
  1 reply →
adql 12 days ago

Any example ?
speed_spread 12 days ago

If you build your whole app with it, yeah. If you use it tactically for a certain data type it can be very nice. Just don't try to dissimulate it's specialness.

queuebert 12 days ago

If the ointer points to a very large block of memory, is it an oinker?

gwbas1c 11 days ago

Could something like this be used to lower memory footprint?

menaerus 11 days ago

Besides that it makes some lock-free algorithms even feasible and some easier to implement, yes, it also can be used to downsize the memory pressure at the cost of extra CPU cycles if you know that the metadata you want to track fits in those 16+ bits.