summaryrefslogtreecommitdiff
path: root/doc/0/0-value.md
diff options
context:
space:
mode:
authorTaylan Kammer <taylan.kammer@gmail.com>2026-06-20 22:53:50 +0200
committerTaylan Kammer <taylan.kammer@gmail.com>2026-06-20 22:53:50 +0200
commitb84ed4f563b3536365f7d3cc4d068407e98685b3 (patch)
tree9ab7b18d712db1329b6230cb45520e7c85dc46fd /doc/0/0-value.md
parentbfaa74b19fc81dbe071d55566a78a8e329237eff (diff)
It's a revolution baby.HEADmaster
Diffstat (limited to 'doc/0/0-value.md')
-rw-r--r--doc/0/0-value.md199
1 files changed, 199 insertions, 0 deletions
diff --git a/doc/0/0-value.md b/doc/0/0-value.md
new file mode 100644
index 0000000..4bb7e0c
--- /dev/null
+++ b/doc/0/0-value.md
@@ -0,0 +1,199 @@
+# NaN-packed Value representation
+
+The format of a binary64 floating-point number, in big-endian notation:
+
+ { sign: 1 bit, exponent: 11 bits, fraction: 52 bits }
+
+When the 11 exponent bits are all set, it's either a NaN or an Infinity.
+
+For value packing, the remaining 53 bits are available, giving us `2^53` values
+minus the following four bit patterns:
+
+ *** FORBIDDEN BIT-PATTERNS ***
+
+ 1. Negative cqNaN :: { sign = 1, exponent = MAX, fraction = 10000... }
+
+ 2. Negative Infinity :: { sign = 1, exponent = MAX, fraction = 00000... }
+
+ 3. Positive cqNaN :: { sign = 0, exponent = MAX, fraction = 10000... }
+
+ 4. Positive Infinity :: { sign = 0, exponent = MAX, fraction = 00000... }
+
+The abbreviation "cqNaN" stands for canonical quiet NaN.
+
+The MSb of the fraction is also called the `is_quiet` flag, because it marks a
+NaN as being "quiet" rather than signaling. The rest of the fraction being all
+zero makes it the *canonical* quiet NaN for the given sign value.
+
+The positive and negative cqNaN are the *only* NaN values that can actually be
+returned by FP operations. This is convenient, because it means we can simply
+use them to represent themselves in Zisp.
+
+Infinity values may also be returned by FP operations, and we want them to also
+exit in Zisp, so they also represent themselves.
+
+Beyond those four bit patterns, all values with a maximum exponent (all bits
+set) are fair game for representing other values, so `2^53 - 4` possibilities.
+
+We split those `2^53 - 4` available values into four groups, each allowing for
+`2^51 - 1` different values to be encoded. (51-bit values excluding zero.)
+
+ sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1
+
+ sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2
+
+ sign = 0, quiet = 1 :: Pointers and various immediates
+
+ sign = 0, quiet = 0 :: Internal use by interpreter
+
+
+## Fixnums
+
+Negative fixnums actually represent themselves, without needing to go through
+any transformation. Only the smallest 52-bit signed negative, `-2^51`, cannot
+be represented, as it would step on Forbidden Value #1, Negative cqNaN.
+
+Positive fixnums go through a bitsiwe NOT (which can be implemented as an XOR
+mask combining it with removal of NaN-related high bits) to avoid the all-zero
+payload value, which would step on Forbidden Value #2, Negative Infinity.
+
+
+## Pointers and immediates
+
+This region of 51-bit values is divided as follows, based on the three highest
+bits, providing a payload value of 48 bits for each.
+
+ 000 :: Pointer to heap object (type-tagged)
+
+ 001 :: Pointer to list values (length-tagged)
+
+ 010 :: Pointer to istr object
+
+ 011 :: Immediate short string
+
+ 100 :: Immediate small rational (sign bit 0)
+
+ 101 :: Immediate small rational (sign bit 1)
+
+ 110 :: Undefined
+
+ 111 :: Immediate types further subdivided as follows:
+
+ 0....... 0....... 0....... (etc.) :: Rune
+
+ 1....... :: 128 40-bit types
+
+ 0....... 1....... :: 16384 32-bit types
+
+ 0....... 0....... 1....... :: 2097152 24-bit types
+
+ (etc.)
+
+Forbidden Value #3, Positive cqNaN, is avoided by not using 0 as a valid type
+tag value for heap pointers.
+
+### Type-tagged heap pointers
+
+Regular heap objects are allocated with 16-byte alignment, meaning the lowest
+four bits are naturally zero. We exploit this by shifting down the address by
+four bits, making room for more tag bits immediately following the 16 high bits
+that mark the value as a heap pointer.
+
+Thus, a value can be checked against a specific heap object type by comparing
+the 20 high bits to a combined constant value: the 16 highest bits indicating
+that it's a heap pointer, and the 4 bits after that denoting the heap type.
+
+### Length-tagged list pointers
+
+Lists are arrays of Value objects, allocated without padding (8-byte alignment)
+for efficient source code representation and traversal. Since the lowest three
+bits are naturally zero, we use them as a 3-bit length information tag.
+
+A length tag value of 1 to 7 means there are exactly that many Value objects
+starting at the address, while 0 means there is an array of at least eight,
+terminated with a special 64-bit sentinel bit-pattern that is not otherwise
+valid as a Zisp value.
+
+### Interned string pointers
+
+Interned string (istr) objects may be unaligned, so the low bits of the pointer
+are not used for any special purpose. Having a separate category for this type
+of pointer also streamlines the interpreter implementation
+
+### Short strings
+
+This 48-bit range is used for strings of zero to six bytes. These are NUL
+terminated unless exactly six bytes, meaning that a literal NUL byte cannot
+appear in them, but otherwise they allow arbitrary byte values.
+
+### Small rationals
+
+We use a 49-bit space for small rational numbers, with a signed 25-bit two's
+complement integer numerator, and 24-bit unsigned integer denominator.
+
+### Runes and other small values
+
+Runes are symbols of up to 6 ASCII characters in length, used to implement
+extensible reader syntax. (See Zisp decoder.) They cannot contain the NUL
+byte, as they are NUL-terminated unless exactly six ASCII bytes in length.
+
+NOTE: The order in which the characters of the rune are encoded depends on
+endianness. On little-endian systems (i.e. most modern architectures) the
+characters will be in "reverse" order, with the first character in lowest
+position, so the terminating NUL has to be searched from low to high.
+
+The fact that runes are limited to ASCII bytes, whose MSb is unset, opens up
+some space for other small values to co-inhabit the same 48-bit value range.
+
+We divide this space into increasingly many potential types, with smaller and
+smaller payloads, where the highest byte with a non-zero MSb determines which
+size category we're in: If the highest byte has its MSb set, then the other
+seven bits are a type tag, and each type has a 40-bit payload; if the second
+highest byte has its MSb set, then the 14 non-MSb bits of the two high bytes
+define the type, and each has a 32-bit payload; and so on.
+
+Unicode code points need 21 bits, so we use a 24-bit type for the Character
+type. Miscellaneous values like True, False, EOF, etc. are placed in an 8-bit
+type, since there will never be that many of them.
+
+A virtually unlimited number of user-defined enum types can fit into the types
+with small payload values here: There is room for over 268 Million 16-bit types
+(28-bit type tag) and over 34 Billion 8-bit types (35-bit type tag).
+
+
+## Internal use values
+
+The final 51-bit range is used for various internal purposes by the Zisp
+interpreter, mostly related to transparent code optimization.
+
+ 000 :: Pointer to heap object as constant
+
+ 001 :: Pointer to list values as constant
+
+ 010 :: Pointer to istr object as constant
+
+ 011 :: Immediate short string as constant
+
+ 100 :: Local variable reference by index
+
+ 101 :: Pointer to constant function-call expression
+
+ 110 :: Pointer to variable function-call expression
+
+ 111 :: Pointer to special-form or macro-call expression
+
+Forbidden Value #4, Positive Infinity, is avoided thanks to the fact that heap
+pointers always have a non-zero heap-type tag. (See further above.)
+
+The first four categories simply mirror those of the previous 51-bit range, but
+mark the values as being constants rather than code to evaluate.
+
+The remaining four categories are somewhat similar to VM instructions.
+
+
+
+<!--
+;; Local Variables:
+;; fill-column: 80
+;; End:
+-->