Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a rather contrived demo, seeing as LSB-tagging makes pointer and fixnum aritmetic work already, with fixnums requiring no heap allocation, and the implementation of the symbol table [0] is very much O(live values) when creating an object. (Do see "Hash-consing garbage collection" [1] for a non-strawman implementation of coalescing heap objects though.)

But putting metadata in the MSBs of pointers is a nice trick for concurrent garbage collection like Pauseless, ZGC, C4 and such [2]...though I'm not sure how much you win, as collections aren't _that_ frequent, so if you map the same physical memory multiple times (like ZGC, probably C4 too) your TLB should pick up whichever replica in virtual memory you're currently accessing quickly.

[0] https://gitlab.com/Linaro/tcwg/tbi_lisp/-/blob/main/SymbolTa...

[1] https://www.cs.princeton.edu/research/techreps/TR-412-93

[2] https://www.usenix.org/legacy/events/vee05/full_papers/p46-c... - the Azul hardware seemingly only stole one bit still.



It's a contrived demo because everything you could want for a Lisp-like object representation, using spare bits in a pointer, can be done without relying on those spare bits being ignored by the memory referencing hardware---only relying on those spare bits being available.

I recently converted a Lisp to NaN tagging. All it needs is that pointers are restricted to the 48 bit range, not that the top 16 bits are ignored.

All values which are pointers have a zero there; if there are nonzero bits, the value is something other than a pointer, and not equal to any pointer.

Other choices are possible within the scope of NaN tagging: it's possible to favor numbers so that values which are floating-point numbers do not require any treatment before use, but recovering pointers requires bit operations.

Either way, the run-time never relies on two different values both being pointers to the same object, let alone both being dereferenced to access that object.

That could turn out to be inconvenient in some ways. While memory access is cheap due to not having to mask the special bits (the hardware doing that for "free"), all comparisons which check "are these two values the same object" are going to require masking, because your comparison instructions (xor, subtract, compare, ...) will not ignore any of the bits. Those word comparisons are pretty frequent operations in Lisp: e.g. whether two values are the same symbol. You more frequently compare a symbol than you peek into it to geet its properties!

(On one platform, the top bits are not ignored: Android. Its libc uses tagging for added safety. E.g. malloc hands you a tagged pointer. Only the low 48 bits of it matter for addressing, but the tag has to match when you call free. This causes a problem, since I need those bits to be zero in order for a value to be of pointer type. Luckily, I allocate heaps in large blocks. On Android, each block remembers its tag, and then I clear the pointer's top 16 bits. If/when the garbage collector is able to return an entire heap block to the system, those top bits are restored in the pointer handed to free, so everything is cool. The overhead of this is absolutely negligible, therefore. One word of storage per tens of thousands of objects, and some rarely executed bit operations.)


> all comparisons which check "are these two values the same object" are going to require masking, because your comparison instructions (xor, subtract, compare, ...) will not ignore any of the bits.

Wouldn’t two pointers to the same object also have the same tag? I don’t understand why you would need to mask it.


The article shows a reference count field. Presumably, two pointers to the same object could differ in the reference count, which would require masking for them to compare equal.


The structure described is an element of a "symbol table" and not a Lisp pointer. Which makes it even weirder honestly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: