# NaN-packed Value representation The format of a binary64 floating-point number, in big-endian notation: { sign: 1 bit, exponent: 11 bits, fraction: 52 bits } When the 11 exponent bits are all set, it's either a NaN or an Infinity. For value packing, the remaining 53 bits are available, giving us `2^53` values minus the following four bit patterns: *** FORBIDDEN BIT-PATTERNS *** 1. Negative cqNaN :: { sign = 1, exponent = MAX, fraction = 10000... } 2. Negative Infinity :: { sign = 1, exponent = MAX, fraction = 00000... } 3. Positive cqNaN :: { sign = 0, exponent = MAX, fraction = 10000... } 4. Positive Infinity :: { sign = 0, exponent = MAX, fraction = 00000... } The abbreviation "cqNaN" stands for canonical quiet NaN. The MSb of the fraction is also called the `is_quiet` flag, because it marks a NaN as being "quiet" rather than signaling. The rest of the fraction being all zero makes it the *canonical* quiet NaN for the given sign value. The positive and negative cqNaN are the *only* NaN values that can actually be returned by FP operations. This is convenient, because it means we can simply use them to represent themselves in Zisp. Infinity values may also be returned by FP operations, and we want them to also exit in Zisp, so they also represent themselves. Beyond those four bit patterns, all values with a maximum exponent (all bits set) are fair game for representing other values, so `2^53 - 4` possibilities. We split those `2^53 - 4` available values into four groups, each allowing for `2^51 - 1` different values to be encoded. (51-bit values excluding zero.) sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1 sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2 sign = 0, quiet = 1 :: Pointers and various immediates sign = 0, quiet = 0 :: Internal use by interpreter ## Fixnums Negative fixnums actually represent themselves, without needing to go through any transformation. Only the smallest 52-bit signed negative, `-2^51`, cannot be represented, as it would step on Forbidden Value #1, Negative cqNaN. Positive fixnums go through a bitsiwe NOT (which can be implemented as an XOR mask combining it with removal of NaN-related high bits) to avoid the all-zero payload value, which would step on Forbidden Value #2, Negative Infinity. ## Pointers and immediates This region of 51-bit values is divided as follows, based on the three highest bits, providing a payload value of 48 bits for each. 000 :: Pointer to heap object (type-tagged) 001 :: Pointer to list values (length-tagged) 010 :: Pointer to istr object 011 :: Immediate short string 100 :: Immediate small rational (sign bit 0) 101 :: Immediate small rational (sign bit 1) 110 :: Undefined 111 :: Immediate types further subdivided as follows: 0....... 0....... 0....... (etc.) :: Rune 1....... :: 128 40-bit types 0....... 1....... :: 16384 32-bit types 0....... 0....... 1....... :: 2097152 24-bit types (etc.) Forbidden Value #3, Positive cqNaN, is avoided by not using 0 as a valid type tag value for heap pointers. ### Type-tagged heap pointers Regular heap objects are allocated with 16-byte alignment, meaning the lowest four bits are naturally zero. We exploit this by shifting down the address by four bits, making room for more tag bits immediately following the 16 high bits that mark the value as a heap pointer. Thus, a value can be checked against a specific heap object type by comparing the 20 high bits to a combined constant value: the 16 highest bits indicating that it's a heap pointer, and the 4 bits after that denoting the heap type. ### Length-tagged list pointers Lists are arrays of Value objects, allocated without padding (8-byte alignment) for efficient source code representation and traversal. Since the lowest three bits are naturally zero, we use them as a 3-bit length information tag. A length tag value of 1 to 7 means there are exactly that many Value objects starting at the address, while 0 means there is an array of at least eight, terminated with a special 64-bit sentinel bit-pattern that is not otherwise valid as a Zisp value. ### Interned string pointers Interned string (istr) objects may be unaligned, so the low bits of the pointer are not used for any special purpose. Having a separate category for this type of pointer also streamlines the interpreter implementation ### Short strings This 48-bit range is used for strings of zero to six bytes. These are NUL terminated unless exactly six bytes, meaning that a literal NUL byte cannot appear in them, but otherwise they allow arbitrary byte values. ### Small rationals We use a 49-bit space for small rational numbers, with a signed 25-bit two's complement integer numerator, and 24-bit unsigned integer denominator. ### Runes and other small values Runes are symbols of up to 6 ASCII characters in length, used to implement extensible reader syntax. (See Zisp decoder.) They cannot contain the NUL byte, as they are NUL-terminated unless exactly six ASCII bytes in length. NOTE: The order in which the characters of the rune are encoded depends on endianness. On little-endian systems (i.e. most modern architectures) the characters will be in "reverse" order, with the first character in lowest position, so the terminating NUL has to be searched from low to high. The fact that runes are limited to ASCII bytes, whose MSb is unset, opens up some space for other small values to co-inhabit the same 48-bit value range. We divide this space into increasingly many potential types, with smaller and smaller payloads, where the highest byte with a non-zero MSb determines which size category we're in: If the highest byte has its MSb set, then the other seven bits are a type tag, and each type has a 40-bit payload; if the second highest byte has its MSb set, then the 14 non-MSb bits of the two high bytes define the type, and each has a 32-bit payload; and so on. Unicode code points need 21 bits, so we use a 24-bit type for the Character type. Miscellaneous values like True, False, EOF, etc. are placed in an 8-bit type, since there will never be that many of them. A virtually unlimited number of user-defined enum types can fit into the types with small payload values here: There is room for over 268 Million 16-bit types (28-bit type tag) and over 34 Billion 8-bit types (35-bit type tag). ## Internal use values The final 51-bit range is used for various internal purposes by the Zisp interpreter, mostly related to transparent code optimization. 000 :: Pointer to heap object as constant 001 :: Pointer to list values as constant 010 :: Pointer to istr object as constant 011 :: Immediate short string as constant 100 :: Local variable reference by index 101 :: Pointer to constant function-call expression 110 :: Pointer to variable function-call expression 111 :: Pointer to special-form or macro-call expression Forbidden Value #4, Positive Infinity, is avoided thanks to the fact that heap pointers always have a non-zero heap-type tag. (See further above.) The first four categories simply mirror those of the previous 51-bit range, but mark the values as being constants rather than code to evaluate. The remaining four categories are somewhat similar to VM instructions.