# NaN-Packing NaN-packing (also called NaN-boxing) is a strategy involving the use of NaN bit patterns, that are otherwise unused, to store various values in them. In the implementation of a dynamically typed language, this can be used to ensure that all types in the language can be represented by a single 64-bit value, which is either a valid double, an actual NaN value, or one of the other NaN bit patterns that represent some other type, potentially in the form of a pointer to a heap object containing further data. This works because pointers only need 48 bits in practice, and the range of unused NaN bit patterns contains an astounding `2^53 - 4` different values. IMPORTANT NOTE: All illustrations of data structures and bit patterns use big-endian. When implementing the strategies described herein, it may be necessary to reorder the elements. For example, the elements of packed structs in Zig are ordered least to most significant. ## The double format The IEEE 754 double-precision binary floating-point aka binary64 format is: { sign: u1, exponent: u11, fraction: u52 } Possible types of values a double can encode include: { sign == any, exponent != 0x7ff, fraction == any } :: Real (finite) { sign == any, exponent == 0x7ff, fraction == 0x0 } :: Infinity { sign == any, exponent == 0x7ff, fraction != 0x0 } :: NaN Note: 0x7ff = u11 with all bits set (0b11111111111) In other words: all exponent bits set, fraction bits all zero :: Infinity all exponent bits set, fraction part non-zero :: NaN ## Details of NaN values There are two different NaN types: signaling and quiet. Quiet NaN may be returned by FP operations to denote invalid results, whereas signaling NaN are never returned by FP operations and serve other purposes. Modern hardware sets the MSB of the fraction to indicate that the NaN is a quiet one, so let's refine our definition for denoting NaN values: { sign: u1, exp: u11, quiet: u1, rest: u51 } Variants of NaN: { sign == any, exp == 0x7ff, quiet == 0, rest >= 0x1 } :: sNaN { sign == any, exp == 0x7ff, quiet == 1, rest == any } :: qNaN Note that in case of the signaling NaN, the rest of the fraction must be non-zero, since otherwise the entire fraction part would be zero and thus denote an infinity rather than a NaN. Most systems have a "canonical" quiet NaN that they use: { sign == any, exp == 0x7ff, quiet == 1, rest == 0x0 } :: cqNaN The sign bit of the canonical quiet NaN is undefined and may differ from operation to operation or depending on the platform. It's useful to see a few common examples expressed in hex: 0x7ff8000000000000 :: cqNaN, sign bit nil 0xfff8000000000000 :: cqNaN, sign bit set 0x7ff8000000000001 :: smallest non-canon qNaN, sign bit nil 0xfff8000000000001 :: smallest non-canon qNaN, sign bit set 0x7fffffffffffffff :: largest non-canon qNaN, sign bit nil 0xffffffffffffffff :: largest non-canon qNaN, sign bit set 0x7ff0000000000001 :: smallest sNaN, sign bit nil 0xfff0000000000001 :: smallest sNaN, sign bit set 0x7ff7ffffffffffff :: largest sNaN, sign bit nil 0xfff7ffffffffffff :: largest sNaN, sign bit set ## Unused NaN bit patterns Let's start with the quiet NaN values. Theoretically, there only needs to be one canonical quiet NaN, so we would have `2^52 - 1` unused bit patterns in the quiet NaN range. In practice, however, the sign bit may differ from one operation to the next. For example, the fabs function may simply clear the sign of the argument, without minding it being a NaN. In that case, if the platform's regular canonical NaN is the one with the sign bit set, we would end up getting another, "semi-canonical" quiet NaN bit pattern, with the sign bit nil. So, both variants of the canonical quiet NaN are in use. This leaves `2^52 - 2` definitely-unused quiet NaN bit patterns: { sign == any, exp == 0x7ff, quiet == 1, rest >= 0x1 } :: Unused qNaN Remember that signaling NaN are defined in a very similar way: { sign == any, exp == 0x7ff, quiet == 0, rest >= 0x1 } :: sNaN Since none of those can be returned by FP operations, they could all be seen as unused, giving us another `2^52 - 2` bit patterns. In total, this gives us `2^53 - 4` definitely-unused NaN bit patterns. ## Representing Zisp values and pointers as unused NaN bit patterns Zisp wants to store two different things in unused NaN patterns: 1. Pointers (to anything in principle) 2. Non-double primitive aka "immediate" values It may seem intuitive to use signaling NaN for one, and quiet NaN for the other. However, this would fragment our "payload" bits, since we would be using the sign bit as its MSB and the remaining 51 bits of the fraction as the rest of the payload. Further, we want to use as many bit patterns as possible for fixnums, so we can have a nice large fixnum range. To this end, it would be nice if we could, for example, use all bit patterns where the sign bit is set for our representation of fixnums, and then the range of bit patterns with the sign bit unset can be shared among the remaining values, and pointers. Then let's do exactly that, and use the sign as the first major distinction between fixnums and other values, using it as a sort of `is_int` flag: { sign == 0x0, exp == 0x7ff, payload == ??? } :: Non-Fixnum { sign == 0x1, exp == 0x7ff, payload == ??? } :: Fixnum It will become apparent in a moment why we haven't defined the payload yet. Given that our payload is the entire fraction part of the IEEE 754 double format, we must be careful not to use the following two payload values regardless of the sign bit: 1. Zero: This would make the bit pattern represent an infinity, since the payload is the entire fraction and a zero fraction indicates infinity. 2. `0x8000000000000` (aka only the MSB is set): This would make the bit pattern a canonical quiet NaN, since the payload MSB is the quiet bit. This means that in each category (sign bit set, or nil) we have `2^52 - 2` possible bit patterns, and the payload has a rather strange definition: 0x0 < payload < 0x8000000000000 < payload < 0xfffffffffffff Can we really fit a continuous range of fixnum values into that payload without significantly complicating things? Yes, we can! Observe. ## Fixnum representation We will store positive and negative fixnums as separate value ranges, using the quiet bit to differentiate between them. Let's go back to considering the quiet bit a separate field: { sign == 0x1, exp == 0x7ff, quiet == 0x0, rest >= 0x1 } :: Positive { sign == 0x1, exp == 0x7ff, quiet == 0x1, rest >= 0x1 } :: Negative But, I hear you say, the positive range is missing zero! Worry not, for maths is wizardry. We will actually store positive values as their ones' complement (bitwise NOT) meaning that all bits being set is our zero, and only the LSB being set is the highest possible value. This must be combined with a bitwise OR mask, to ensure that the 13 highest of the 64 bits turn into the correct starting bit pattern for a signed NaN. Unpacking it is just as simple: Take the ones' complement (bitwise NOT) and then use an AND mask to unset the 13 highest: POS_INT_PACK(x) = ~x | 0xfff8000000000000 POS_INT_UNPACK(x) = ~x & 0x0007ffffffffffff If you've been paying very close attention, you may notice something: Given that we know the 13 highest bits must always have a certain respective value in the packed and unpacked representation (12 highest set when packed, none set when unpacked), we can use an XOR to flip between the two, and the same XOR can take care of flipping the remaining 51 bits at the same time! This also means packing and unpacking is the same operation: POS_INT_PACK(x) = x ^ 0xfff7ffffffffffff POS_INT_UNPACK(x) = x ^ 0xfff7ffffffffffff There we go; packing and unpacking 51-bit positive fixnums with one XOR! Amazing, isn't it? As for the negative values, it's even simpler. This time, the correct NaN starting pattern has all 13 bits set, since the quiet bit being set is what we use to determine the number being negative. And would you believe it; this means the packed negative fixnum already represents itself! NEG_INT_PACK(x) = x NEG_INT_UNPACK(x) = x Isn't that unbelievable? I need to verify this strategy further, but based on all information I can find about NaN values, it should work just fine. The only disappointing thing is that it's positive integers that need an XOR to pack and unpack, rather than negative ones. One would expect positive values to occur much more frequently in typical code. But I think we can live with a single XOR instruction! ## Pointers & Others We still want to represent the following, which must share space within the `2^52 - 2` bit patterns that can be packed into an unsigned NaN: - Pointers of various kinds - Unicode code points (21-bit values) - False, true, null, end-of-file, and maybe a few more singletons It seems sensible to split this into two main categories: pointers and other values. Let's use the quiet bit as a `pointer` flag: { sign == 0x0, exp == 0x7ff, quiet == 0x0, rest >= 0x1 } :: Other { sign == 0x0, exp == 0x7ff, quiet == 0x1, rest >= 0x1 } :: Pointer Note how neither type is allowed to have a zero payload, since in case of an unset quiet bit, this would make our value an infinity, and in case of a set quiet bit it would give us a canonical quiet NaN. Each of them is allowed any other payload than zero. ## Pointers It would seem that we have 51 bits left to represent a pointer (though we need to avoid the value zero). But we only need 48 bits... or even less! Since allocations happen at 8-byte boundaries on 64-bit machines, we only really need 45 of the 48 bits, given the least significant 3 will never be set. This gives us a whole 6 free bits to tag pointers with! If we have that much play room, we can do some crazy things. ### Foreign pointers Firstly, let's introduce the concept of a "foreign" pointer. This means the pointer doesn't necessarily point to a Zisp object, and may not be 8-byte aligned. As it may point to anything, there's no point in defining further bits as tagging additional information, so we have all 50 bits available. Let's cut out the 12 high bits of our double since we already know what they must contain, and look at the definition of our 52-bit payload. I will also mix up the notation a bit, to indicate that some fields are only defined if a previous field has a given value. { pointer == 0x1, foreign: u1, rest: u50 } (The `pointer` field is the `quiet` bit i.e. MSB of the 52-bit fraction.) If the foreign bit is set, then the entire `rest` field shall be seen as opaque and may contain any value. Another way to look at this is that we essentially defined another fixnum range of 50 bits. This can include the value zero, since the foreign bit being set ensures we don't step on the forbidden all-zero payload value. ### Zisp pointers Now let's look at what we can do with "native" Zisp pointers. Wouldn't it be nice if our language had an explicit "pointer" data type and it didn't require any additional heap allocation? So let's decide that one bit is dedicated to distinguishing between an explicit pointer object, and regular pointers that stand in for the object being pointed to. Perhaps it would be good to show some Zisp pseudo-code to explain what that means, since it may be a strange concept: ;; In memory, vec is represented as a regular/direct vector pointer. (define vec (vector 1 2 3)) ;; We can of course use this variable as a vector. (vector? vec) ;=> #t (vector-ref vec 0) ;=> 1 ;; Now we create an explicit pointer object pointing to that vector. ;; Distinguished by a special bit in the in-memory value of vec-ptr. (define vec-ptr (pointer vec)) ;; This variable is *not* a vector; it's a vector-pointer. (vector? vec-ptr) ;=> #f (vector-ref vec-ptr 0) ;ERROR (pointer? vec-ptr) ;=> #t (pointer-ref vec-ptr) ;=> #(1 2 3) This is *not* the same thing as a box, because it can *only* refer to heap allocated objects, not immediates, whereas a box would be able to hold an immediate value like an integer or double as well. (pointer 42) ;ERROR (box 42) ;=> # A box would necessarily need heap allocation, whereas a pointer doesn't. It's *also not* the same thing as a foreign pointer, because those can be anything, whereas these pointer objects definitely point to Zisp objects. Pointers may or may not be mutable; I've not made up my mind yet. It may seem like a pointless data type, but it adds a little bit of expressive strength to our language. For example, when working with an FFI. And there's really not much else we can do with all our bits. Let's use the term "indirect" for this tag, since "pointer" is already used: { pointer == 0x1, foreign == 0x0, indirect: u1, rest: u49 } Should these indirect pointers objects be mutable, then they may contain a null pointer; the forbidden zero value is avoided through the fact that the indirect bit is set. Hmm, indirect pointers may instead become weak pointers at some point! This would fit perfectly since they can contain null. Direct or indirect makes no difference to the fact that the pointer value will be 8-byte aligned, so we still have 4 bits for more information about what's being pointed to. Also, since the actual pointer value can never be zero (all non-foreign pointers must point to a valid Zisp object), we avoid the forbidden zero pattern. Thus, we can indicate 16 different values with our 4 remaining bits. It would have been nice to avoid fragmentation of these remaining tag bits. However, we want to avoid shifting, so let's go with this definition for the remaining 49 bits: { tag_high: u1, pointer_value: u45, tag_low: u3 } The pointer value is extracted by masking the entire bit sequence, so it actually becomes a 48-bit value without further shifting. (This part of the article is kinda obsolete. Implementation details are up for debate and we may or may not use bit shifting. It's not that expensive of an operation, after all.) The tag can be used to tell us what we're pointing to, so that type checks often don't require following the pointer. The memory location that's being pointed to may duplicate this information, since we may want to ensure that any Zisp object on the heap carries its type information within itself, but I'm not yet decided on that. In any case, let's list some common heap data types that our 4-bit tag can represent, making sure to have an "other" wildcard for future extensions. The right side shows the value of the type tag when it's acquired by masking the 49-bit Zisp pointer payload. 0. String (Symbol) ... 0x0000000000000 1. Pair (List) 0x0000000000001 2. Vector ............ 0x0000000000002 3. Map (Hash-table) 0x0000000000003 4. Box ............... 0x0000000000004 5. Record 0x0000000000005 6. Class ............. 0x0000000000006 7. Instance 0x0000000000007 8. Text .............. 0x1000000000000 9. Byte-vector 0x1000000000001 10. Procedure ........ 0x1000000000002 11. Continuation 0x1000000000003 12. Port ............. 0x1000000000004 13. Error 0x1000000000005 14. Enum ............. 0x1000000000006 15. Other 0x1000000000007 This list is likely to change; for example: errors should probably be class instances, continuations could be merged with procedures, and so on. But this gives us a rough picture and demonstrates that 16 distinct values is quite sufficient for avoiding a pointer de-reference in type checking. (Why is it so important to avoid following a pointer when checking a type? Who knows? Did I say it was important? Why look at me like that??) (Since I wrote this, I decided to use bit shifting after all, and the tags are straightforward values from 0 to 15.) ## Other values We still have one entire `2^51 - 1` value range left. We will split it the following way. This one uses a very simple partitioning scheme: { tag: u3, payload: u48 } The following tags are defined: 001 = short string 010 = char (Unicode code point) 100 = singletons (false, true, etc.) Other tags are undefined and reserved for the future. Note that 000 is missing, so we automatically avoid the forbidden zero payload. ### What the heck is a "short string"? Remember that [strings are immutable](symbols.html) in Zisp. This allows us to use an amazing optimization where short strings can be represented as immediate values. We can't get to 56 bits (7 bytes), but 48 bits (6 bytes) fits perfectly into our payload! So any interned string (equivalent to a Scheme symbol) in Zisp will in fact be an immediate value if 6 bytes or shorter, and doesn't need any heap allocation. Awesome! There can still be uninterned strings that are 6 bytes or shorter, and calling intern on them would return the canonical, immediate version. ### Unicode code points This is an easy one. We have 48 bits, and only need 21. Just write the Unicode code point into the payload: done. This value range may be split in the future to fit other things in it, as we've wasted a ton of bits here. ### Singletons This 48-bit value range contains various singletons like Boolean values, the empty list aka null, and so on. This is even more wasteful than using 48 bits for Unicode, so again this value range may be partitioned further at some point. ### Undefined ranges We have a whole 48-bit value range (sans one forbidden value) that's still unused, plus another 50-bit range (or two 49-bit ranges, or three 48-bit). It's incredible just how much stuff you can cram into a NaN. I would have never thought it possible. Ours may just be the most sophisticated NaN-packing strategy ever devised, because I couldn't find any information on the web about the possibility of using both signaling and quiet NaNs. All articles I've stumbled upon either claim that you must avoid signaling NaNs or quiet NaNs, or they take a naive approach to the subdivision of the available bit patterns and end up wasting tons of bit real estate. Stay tuned for the development of Zisp, because this is getting serious!