1 files changed, 458 insertions, 0 deletions
diff --git a/notes/nan.md b/notes/nan.md
new file mode 100644
index 0000000..f8f3f80
--- /dev/null
+++ b/notes/nan.md
@@ -0,0 +1,458 @@
+# NaN-Packing
+
+NaN-packing (also called NaN-boxing) is a strategy involving the use of NaN
+bit patterns, that are otherwise unused, to store various values in them.
+
+In the implementation of a dynamically typed language, this can be used to
+ensure that all types in the language can be represented by a single 64-bit
+value, which is either a valid double, an actual NaN value, or one of the
+other NaN bit patterns that represent some other type, potentially in the
+form of a pointer to a heap object containing further data.
+
+This works because pointers only need 48 bits in practice, and the range of
+unused NaN bit patterns contains an astounding `2^53 - 4` different values.
+
+IMPORTANT NOTE: All illustrations of data structures and bit patterns use
+big-endian.  When implementing the strategies described herein, it may be
+necessary to reorder the elements.  For example, the elements of packed
+structs in Zig are ordered least to most significant.
+
+
+## The double format
+
+The IEEE 754 double-precision binary floating-point aka binary64 format is:
+
+    { sign: u1, exponent: u11, fraction: u52 }
+
+Possible types of values a double can encode include:
+
+    { sign == any, exponent != 0x7ff, fraction == any } :: Real (finite)
+    { sign == any, exponent == 0x7ff, fraction == 0x0 } :: Infinity
+    { sign == any, exponent == 0x7ff, fraction != 0x0 } :: NaN
+
+Note:
+
+    0x7ff = u11 with all bits set (0b11111111111)
+
+In other words:
+
+    all exponent bits set, fraction bits all zero :: Infinity
+    all exponent bits set, fraction part non-zero :: NaN
+
+
+## Details of NaN values
+
+There are two different NaN types: signaling and quiet.  Quiet NaN may be
+returned by FP operations to denote invalid results, whereas signaling NaN
+are never returned by FP operations and serve other purposes.
+
+Modern hardware sets the MSB of the fraction to indicate that the NaN is a
+quiet one, so let's refine our definition for denoting NaN values:
+
+    { sign: u1, exp: u11, quiet: u1, rest: u51 }
+
+Variants of NaN:
+
+    { sign == any, exp == 0x7ff, quiet == 0, rest >= 0x1 } :: sNaN
+    { sign == any, exp == 0x7ff, quiet == 1, rest == any } :: qNaN
+
+Note that in case of the signaling NaN, the rest of the fraction must be
+non-zero, since otherwise the entire fraction part would be zero and thus
+denote an infinity rather than a NaN.
+
+Most systems have a "canonical" quiet NaN that they use:
+
+    { sign == any, exp == 0x7ff, quiet == 1, rest == 0x0 } :: cqNaN
+
+The sign bit of the canonical quiet NaN is undefined and may differ from
+operation to operation or depending on the platform.
+
+It's useful to see a few common examples expressed in hex:
+
+    0x7ff8000000000000 :: cqNaN, sign bit nil
+    0xfff8000000000000 :: cqNaN, sign bit set
+
+    0x7ff8000000000001 :: smallest non-canon qNaN, sign bit nil
+    0xfff8000000000001 :: smallest non-canon qNaN, sign bit set
+
+    0x7fffffffffffffff :: largest non-canon qNaN, sign bit nil
+    0xffffffffffffffff :: largest non-canon qNaN, sign bit set
+
+    0x7ff0000000000001 :: smallest sNaN, sign bit nil
+    0xfff0000000000001 :: smallest sNaN, sign bit set
+
+    0x7ff7ffffffffffff :: largest sNaN, sign bit nil
+    0xfff7ffffffffffff :: largest sNaN, sign bit set
+
+
+## Unused NaN bit patterns
+
+Let's start with the quiet NaN values.
+
+Theoretically, there only needs to be one canonical quiet NaN, so we would
+have `2^52 - 1` unused bit patterns in the quiet NaN range.  In practice,
+however, the sign bit may differ from one operation to the next.
+
+For example, the fabs function may simply clear the sign of the argument,
+without minding it being a NaN.  In that case, if the platform's regular
+canonical NaN is the one with the sign bit set, we would end up getting
+another, "semi-canonical" quiet NaN bit pattern, with the sign bit nil.
+
+So, both variants of the canonical quiet NaN are in use.
+
+This leaves `2^52 - 2` definitely-unused quiet NaN bit patterns:
+
+    { sign == any, exp == 0x7ff, quiet == 1, rest >= 0x1 } :: Unused qNaN
+
+Remember that signaling NaN are defined in a very similar way:
+
+    { sign == any, exp == 0x7ff, quiet == 0, rest >= 0x1 } :: sNaN
+
+Since none of those can be returned by FP operations, they could all be seen
+as unused, giving us another `2^52 - 2` bit patterns.
+
+In total, this gives us `2^53 - 4` definitely-unused NaN bit patterns.
+
+
+## Representing Zisp values and pointers as unused NaN bit patterns
+
+Zisp wants to store two different things in unused NaN patterns:
+
+1. Pointers (to anything in principle)
+
+2. Non-double primitive aka "immediate" values
+
+It may seem intuitive to use signaling NaN for one, and quiet NaN for the
+other.  However, this would fragment our "payload" bits, since we would be
+using the sign bit as its MSB and the remaining 51 bits of the fraction as
+the rest of the payload.
+
+Further, we want to use as many bit patterns as possible for fixnums, so we
+can have a nice large fixnum range.  To this end, it would be nice if we
+could, for example, use all bit patterns where the sign bit is set for our
+representation of fixnums, and then the range of bit patterns with the sign
+bit unset can be shared among the remaining values, and pointers.
+
+Then let's do exactly that, and use the sign as the first major distinction
+between fixnums and other values, using it as a sort of `is_int` flag:
+
+    { sign == 0x0, exp == 0x7ff, payload == ??? } :: Non-Fixnum
+    { sign == 0x1, exp == 0x7ff, payload == ??? } :: Fixnum
+
+It will become apparent in a moment why we haven't defined the payload yet.
+
+Given that our payload is the entire fraction part of the IEEE 754 double
+format, we must be careful not to use the following two payload values
+regardless of the sign bit:
+
+1. Zero: This would make the bit pattern represent an infinity, since the
+payload is the entire fraction and a zero fraction indicates infinity.
+
+2. `0x8000000000000` (aka only the MSB is set): This would make the bit
+pattern a canonical quiet NaN, since the payload MSB is the quiet bit.
+
+This means that in each category (sign bit set, or nil) we have `2^52 - 2`
+possible bit patterns, and the payload has a rather strange definition:
+
+    0x0 < payload < 0x8000000000000 < payload < 0xfffffffffffff
+
+Can we really fit a continuous range of fixnum values into that payload
+without significantly complicating things?  Yes, we can!  Observe.
+
+
+## Fixnum representation
+
+We will store positive and negative fixnums as separate value ranges, using
+the quiet bit to differentiate between them.
+
+Let's go back to considering the quiet bit a separate field:
+
+    { sign == 0x1, exp == 0x7ff, quiet == 0x0, rest >= 0x1 } :: Positive
+    { sign == 0x1, exp == 0x7ff, quiet == 0x1, rest >= 0x1 } :: Negative
+
+But, I hear you say, the positive range is missing zero!  Worry not, for
+maths is wizardry.  We will actually store positive values as their ones'
+complement (bitwise NOT) meaning that all bits being set is our zero, and
+only the LSB being set is the highest possible value.
+
+This must be combined with a bitwise OR mask, to ensure that the 13 highest
+of the 64 bits turn into the correct starting bit pattern for a signed NaN.
+Unpacking it is just as simple: Take the ones' complement (bitwise NOT) and
+then use an AND mask to unset the 13 highest:
+
+    POS_INT_PACK(x)   = ~x | 0xfff8000000000000
+
+    POS_INT_UNPACK(x) = ~x & 0x0007ffffffffffff
+
+If you've been paying very close attention, you may notice something: Given
+that we know the 13 highest bits must always have a certain respective value
+in the packed and unpacked representation (12 highest set when packed, none
+set when unpacked), we can use an XOR to flip between the two, and the same
+XOR can take care of flipping the remaining 51 bits at the same time!
+
+This also means packing and unpacking is the same operation:
+
+    POS_INT_PACK(x)   = x ^ 0xfff7ffffffffffff
+
+    POS_INT_UNPACK(x) = x ^ 0xfff7ffffffffffff
+
+There we go; packing and unpacking 51-bit positive fixnums with one XOR!
+Amazing, isn't it?
+
+As for the negative values, it's even simpler.  This time, the correct NaN
+starting pattern has all 13 bits set, since the quiet bit being set is what
+we use to determine the number being negative.  And would you believe it;
+this means the packed negative fixnum already represents itself!
+
+    NEG_INT_PACK(x)   = x
+
+    NEG_INT_UNPACK(x) = x
+
+Isn't that unbelievable?  I need to verify this strategy further, but based
+on all information I can find about NaN values, it should work just fine.
+
+The only disappointing thing is that it's positive integers that need an XOR
+to pack and unpack, rather than negative ones.  One would expect positive
+values to occur much more frequently in typical code.  But I think we can
+live with a single XOR instruction!
+
+
+## Pointers & Others
+
+We still want to represent the following, which must share space within the
+`2^52 - 2` bit patterns that can be packed into an unsigned NaN:
+
+- Pointers of various kinds
+- Unicode code points (21-bit values)
+- False, true, null, end-of-file, and maybe a few more singletons
+
+It seems sensible to split this into two main categories: pointers and other
+values.  Let's use the quiet bit as a `pointer` flag:
+
+    { sign == 0x0, exp == 0x7ff, quiet == 0x0, rest >= 0x1 } :: Other
+    { sign == 0x0, exp == 0x7ff, quiet == 0x1, rest >= 0x1 } :: Pointer
+
+Note how neither type is allowed to have a zero payload, since in case of an
+unset quiet bit, this would make our value an infinity, and in case of a set
+quiet bit it would give us a canonical quiet NaN.  Each of them is allowed
+any other payload than zero.
+
+
+## Pointers
+
+It would seem that we have 51 bits left to represent a pointer (though we
+need to avoid the value zero).  But we only need 48 bits... or even less!
+Since allocations happen at 8-byte boundaries on 64-bit machines, we only
+really need 45 of the 48 bits, given the least significant 3 will never be
+set.  This gives us a whole 6 free bits to tag pointers with!  If we have
+that much play room, we can do some crazy things.
+
+### Foreign pointers
+
+Firstly, let's introduce the concept of a "foreign" pointer.  This means the
+pointer doesn't necessarily point to a Zisp object, and may not be 8-byte
+aligned.  As it may point to anything, there's no point in defining further
+bits as tagging additional information, so we have all 50 bits available.
+
+Let's cut out the 12 high bits of our double since we already know what they
+must contain, and look at the definition of our 52-bit payload.
+
+I will also mix up the notation a bit, to indicate that some fields are only
+defined if a previous field has a given value.
+
+    { pointer == 0x1, foreign: u1, rest: u50 }
+
+(The `pointer` field is the `quiet` bit i.e. MSB of the 52-bit fraction.)
+
+If the foreign bit is set, then the entire `rest` field shall be seen as
+opaque and may contain any value.  Another way to look at this is that we
+essentially defined another fixnum range of 50 bits.  This can include the
+value zero, since the foreign bit being set ensures we don't step on the
+forbidden all-zero payload value.
+
+### Zisp pointers
+
+Now let's look at what we can do with "native" Zisp pointers.
+
+Wouldn't it be nice if our language had an explicit "pointer" data type and
+it didn't require any additional heap allocation?  So let's decide that one
+bit is dedicated to distinguishing between an explicit pointer object, and
+regular pointers that stand in for the object being pointed to.
+
+Perhaps it would be good to show some Zisp pseudo-code to explain what that
+means, since it may be a strange concept:
+
+    ;; In memory, vec is represented as a regular/direct vector pointer.
+    (define vec (vector 1 2 3))
+
+    ;; We can of course use this variable as a vector.
+    (vector? vec)      ;=> #t
+    (vector-ref vec 0) ;=> 1
+
+    ;; Now we create an explicit pointer object pointing to that vector.
+    ;; Distinguished by a special bit in the in-memory value of vec-ptr.
+    (define vec-ptr (pointer vec))
+
+    ;; This variable is *not* a vector; it's a vector-pointer.
+    (vector? vec-ptr)      ;=> #f
+    (vector-ref vec-ptr 0) ;ERROR
+    (pointer? vec-ptr)     ;=> #t
+    (pointer-ref vec-ptr)  ;=> #(1 2 3)
+
+This is *not* the same thing as a box, because it can *only* refer to heap
+allocated objects, not immediates, whereas a box would be able to hold an
+immediate value like an integer or double as well.
+
+    (pointer 42) ;ERROR
+    (box 42)     ;=> #<box:42>
+
+A box would necessarily need heap allocation, whereas a pointer doesn't.
+
+It's *also not* the same thing as a foreign pointer, because those can be
+anything, whereas these pointer objects definitely point to Zisp objects.
+
+Pointers may or may not be mutable; I've not made up my mind yet.  It may
+seem like a pointless data type, but it adds a little bit of expressive
+strength to our language.  For example, when working with an FFI.  And
+there's really not much else we can do with all our bits.
+
+Let's use the term "indirect" for this tag, since "pointer" is already used:
+
+    { pointer == 0x1, foreign == 0x0, indirect: u1, rest: u49 }
+
+Should these indirect pointers objects be mutable, then they may contain a
+null pointer; the forbidden zero value is avoided through the fact that the
+indirect bit is set.
+
+Hmm, indirect pointers may instead become weak pointers at some point!  This
+would fit perfectly since they can contain null.
+
+Direct or indirect makes no difference to the fact that the pointer value
+will be 8-byte aligned, so we still have 4 bits for more information about
+what's being pointed to.  Also, since the actual pointer value can never be
+zero (all non-foreign pointers must point to a valid Zisp object), we avoid
+the forbidden zero pattern.  Thus, we can indicate 16 different values with
+our 4 remaining bits.
+
+It would have been nice to avoid fragmentation of these remaining tag bits.
+However, we want to avoid shifting, so let's go with this definition for the
+remaining 49 bits:
+
+    { tag_high: u1, pointer_value: u45, tag_low: u3 }
+
+The pointer value is extracted by masking the entire bit sequence, so it
+actually becomes a 48-bit value without further shifting.
+
+(This part of the article is kinda obsolete.  Implementation details are up
+for debate and we may or may not use bit shifting.  It's not that expensive
+of an operation, after all.)
+
+The tag can be used to tell us what we're pointing to, so that type checks
+often don't require following the pointer.  The memory location that's being
+pointed to may duplicate this information, since we may want to ensure that
+any Zisp object on the heap carries its type information within itself, but
+I'm not yet decided on that.
+
+In any case, let's list some common heap data types that our 4-bit tag can
+represent, making sure to have an "other" wildcard for future extensions.
+
+The right side shows the value of the type tag when it's acquired by masking
+the 49-bit Zisp pointer payload.
+
+    0. String (Symbol) ... 0x0000000000000
+    1. Pair (List)         0x0000000000001
+    2. Vector ............ 0x0000000000002
+    3. Map (Hash-table)    0x0000000000003
+    4. Box ............... 0x0000000000004
+    5. Record              0x0000000000005
+    6. Class ............. 0x0000000000006
+    7. Instance            0x0000000000007
+    8. Text .............. 0x1000000000000
+    9. Byte-vector         0x1000000000001
+    10. Procedure ........ 0x1000000000002
+    11. Continuation       0x1000000000003
+    12. Port ............. 0x1000000000004
+    13. Error              0x1000000000005
+    14. Enum ............. 0x1000000000006
+    15. Other              0x1000000000007
+
+This list is likely to change; for example: errors should probably be class
+instances, continuations could be merged with procedures, and so on.  But
+this gives us a rough picture and demonstrates that 16 distinct values is
+quite sufficient for avoiding a pointer de-reference in type checking.
+
+(Why is it so important to avoid following a pointer when checking a type?
+Who knows?  Did I say it was important?  Why look at me like that??)
+
+(Since I wrote this, I decided to use bit shifting after all, and the tags
+are straightforward values from 0 to 15.)
+
+
+## Other values
+
+We still have one entire `2^51 - 1` value range left.  We will split it the
+following way.  This one uses a very simple partitioning scheme:
+
+    { tag: u3, payload: u48 }
+
+The following tags are defined:
+
+    001 = short string
+    010 = char (Unicode code point)
+    100 = singletons (false, true, etc.)
+
+Other tags are undefined and reserved for the future.  Note that 000 is
+missing, so we automatically avoid the forbidden zero payload.
+
+### What the heck is a "short string"?
+
+Remember that [strings are immutable](symbols.html) in Zisp.  This allows us
+to use an amazing optimization where short strings can be represented as
+immediate values.
+
+We can't get to 56 bits (7 bytes), but 48 bits (6 bytes) fits perfectly into
+our payload!  So any interned string (equivalent to a Scheme symbol) in Zisp
+will in fact be an immediate value if 6 bytes or shorter, and doesn't need
+any heap allocation.  Awesome!
+
+There can still be uninterned strings that are 6 bytes or shorter, and
+calling intern on them would return the canonical, immediate version.
+
+### Unicode code points
+
+This is an easy one.  We have 48 bits, and only need 21.  Just write the
+Unicode code point into the payload: done.
+
+This value range may be split in the future to fit other things in it, as
+we've wasted a ton of bits here.
+
+### Singletons
+
+This 48-bit value range contains various singletons like Boolean values, the
+empty list aka null, and so on.
+
+This is even more wasteful than using 48 bits for Unicode, so again this
+value range may be partitioned further at some point.
+
+### Undefined ranges
+
+We have a whole 48-bit value range (sans one forbidden value) that's still
+unused, plus another 50-bit range (or two 49-bit ranges, or three 48-bit).
+
+It's incredible just how much stuff you can cram into a NaN.  I would have
+never thought it possible.
+
+Ours may just be the most sophisticated NaN-packing strategy ever devised,
+because I couldn't find any information on the web about the possibility of
+using both signaling and quiet NaNs.  All articles I've stumbled upon either
+claim that you must avoid signaling NaNs or quiet NaNs, or they take a naive
+approach to the subdivision of the available bit patterns and end up wasting
+tons of bit real estate.
+
+Stay tuned for the development of Zisp, because this is getting serious!
+
+<!--
+;; Local Variables:
+;; fill-column: 77
+;; End:
+-->