From b84ed4f563b3536365f7d3cc4d068407e98685b3 Mon Sep 17 00:00:00 2001
From: Taylan Kammer <taylan.kammer@gmail.com>
Date: Sat, 20 Jun 2026 22:53:50 +0200
Subject: It's a revolution baby.

---
 doc/0/0-value.md | 199 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 199 insertions(+)
 create mode 100644 doc/0/0-value.md

(limited to 'doc/0/0-value.md')

diff --git a/doc/0/0-value.md b/doc/0/0-value.md
new file mode 100644
index 0000000..4bb7e0c
--- /dev/null
+++ b/doc/0/0-value.md
@@ -0,0 +1,199 @@
+# NaN-packed Value representation
+
+The format of a binary64 floating-point number, in big-endian notation:
+
+    { sign: 1 bit, exponent: 11 bits, fraction: 52 bits }
+
+When the 11 exponent bits are all set, it's either a NaN or an Infinity.
+
+For value packing, the remaining 53 bits are available, giving us `2^53` values
+minus the following four bit patterns:
+
+    *** FORBIDDEN BIT-PATTERNS ***
+
+    1. Negative cqNaN    :: { sign = 1, exponent = MAX, fraction = 10000... }
+
+    2. Negative Infinity :: { sign = 1, exponent = MAX, fraction = 00000... }
+
+    3. Positive cqNaN    :: { sign = 0, exponent = MAX, fraction = 10000... }
+
+    4. Positive Infinity :: { sign = 0, exponent = MAX, fraction = 00000... }
+
+The abbreviation "cqNaN" stands for canonical quiet NaN.
+
+The MSb of the fraction is also called the `is_quiet` flag, because it marks a
+NaN as being "quiet" rather than signaling.  The rest of the fraction being all
+zero makes it the *canonical* quiet NaN for the given sign value.
+
+The positive and negative cqNaN are the *only* NaN values that can actually be
+returned by FP operations.  This is convenient, because it means we can simply
+use them to represent themselves in Zisp.
+
+Infinity values may also be returned by FP operations, and we want them to also
+exit in Zisp, so they also represent themselves.
+
+Beyond those four bit patterns, all values with a maximum exponent (all bits
+set) are fair game for representing other values, so `2^53 - 4` possibilities.
+
+We split those `2^53 - 4` available values into four groups, each allowing for
+`2^51 - 1` different values to be encoded.  (51-bit values excluding zero.)
+
+    sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1
+
+    sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2
+
+    sign = 0, quiet = 1 :: Pointers and various immediates
+
+    sign = 0, quiet = 0 :: Internal use by interpreter
+
+
+## Fixnums
+
+Negative fixnums actually represent themselves, without needing to go through
+any transformation.  Only the smallest 52-bit signed negative, `-2^51`, cannot
+be represented, as it would step on Forbidden Value #1, Negative cqNaN.
+
+Positive fixnums go through a bitsiwe NOT (which can be implemented as an XOR
+mask combining it with removal of NaN-related high bits) to avoid the all-zero
+payload value, which would step on Forbidden Value #2, Negative Infinity.
+
+
+## Pointers and immediates
+
+This region of 51-bit values is divided as follows, based on the three highest
+bits, providing a payload value of 48 bits for each.
+
+    000   :: Pointer to heap object  (type-tagged)
+
+    001   :: Pointer to list values  (length-tagged)
+
+    010   :: Pointer to istr object
+
+    011   :: Immediate short string
+
+    100   :: Immediate small rational  (sign bit 0)
+
+    101   :: Immediate small rational  (sign bit 1)
+
+    110   :: Undefined
+
+    111   :: Immediate types further subdivided as follows:
+
+             0....... 0....... 0....... (etc.)  :: Rune
+
+             1.......                           :: 128 40-bit types
+
+             0....... 1.......                  :: 16384 32-bit types
+
+             0....... 0....... 1.......         :: 2097152 24-bit types
+
+             (etc.)
+
+Forbidden Value #3, Positive cqNaN, is avoided by not using 0 as a valid type
+tag value for heap pointers.
+
+### Type-tagged heap pointers
+
+Regular heap objects are allocated with 16-byte alignment, meaning the lowest
+four bits are naturally zero.  We exploit this by shifting down the address by
+four bits, making room for more tag bits immediately following the 16 high bits
+that mark the value as a heap pointer.
+
+Thus, a value can be checked against a specific heap object type by comparing
+the 20 high bits to a combined constant value: the 16 highest bits indicating
+that it's a heap pointer, and the 4 bits after that denoting the heap type.
+
+### Length-tagged list pointers
+
+Lists are arrays of Value objects, allocated without padding (8-byte alignment)
+for efficient source code representation and traversal.  Since the lowest three
+bits are naturally zero, we use them as a 3-bit length information tag.
+
+A length tag value of 1 to 7 means there are exactly that many Value objects
+starting at the address, while 0 means there is an array of at least eight,
+terminated with a special 64-bit sentinel bit-pattern that is not otherwise
+valid as a Zisp value.
+
+### Interned string pointers
+
+Interned string (istr) objects may be unaligned, so the low bits of the pointer
+are not used for any special purpose.  Having a separate category for this type
+of pointer also streamlines the interpreter implementation
+
+### Short strings
+
+This 48-bit range is used for strings of zero to six bytes.  These are NUL
+terminated unless exactly six bytes, meaning that a literal NUL byte cannot
+appear in them, but otherwise they allow arbitrary byte values.
+
+### Small rationals
+
+We use a 49-bit space for small rational numbers, with a signed 25-bit two's
+complement integer numerator, and 24-bit unsigned integer denominator.
+
+### Runes and other small values
+
+Runes are symbols of up to 6 ASCII characters in length, used to implement
+extensible reader syntax.  (See Zisp decoder.)  They cannot contain the NUL
+byte, as they are NUL-terminated unless exactly six ASCII bytes in length.
+
+NOTE: The order in which the characters of the rune are encoded depends on
+endianness.  On little-endian systems (i.e. most modern architectures) the
+characters will be in "reverse" order, with the first character in lowest
+position, so the terminating NUL has to be searched from low to high.
+
+The fact that runes are limited to ASCII bytes, whose MSb is unset, opens up
+some space for other small values to co-inhabit the same 48-bit value range.
+
+We divide this space into increasingly many potential types, with smaller and
+smaller payloads, where the highest byte with a non-zero MSb determines which
+size category we're in: If the highest byte has its MSb set, then the other
+seven bits are a type tag, and each type has a 40-bit payload; if the second
+highest byte has its MSb set, then the 14 non-MSb bits of the two high bytes
+define the type, and each has a 32-bit payload; and so on.
+
+Unicode code points need 21 bits, so we use a 24-bit type for the Character
+type.  Miscellaneous values like True, False, EOF, etc. are placed in an 8-bit
+type, since there will never be that many of them.
+
+A virtually unlimited number of user-defined enum types can fit into the types
+with small payload values here: There is room for over 268 Million 16-bit types
+(28-bit type tag) and over 34 Billion 8-bit types (35-bit type tag).
+
+
+## Internal use values
+
+The final 51-bit range is used for various internal purposes by the Zisp
+interpreter, mostly related to transparent code optimization.
+
+  000   :: Pointer to heap object as constant
+
+  001   :: Pointer to list values as constant
+
+  010   :: Pointer to istr object as constant
+
+  011   :: Immediate short string as constant
+
+  100   :: Local variable reference by index
+
+  101   :: Pointer to constant function-call expression
+
+  110   :: Pointer to variable function-call expression
+
+  111   :: Pointer to special-form or macro-call expression
+
+Forbidden Value #4, Positive Infinity, is avoided thanks to the fact that heap
+pointers always have a non-zero heap-type tag.  (See further above.)
+
+The first four categories simply mirror those of the previous 51-bit range, but
+mark the values as being constants rather than code to evaluate.
+
+The remaining four categories are somewhat similar to VM instructions.
+
+
+
+<!--
+;; Local Variables:
+;; fill-column: 80
+;; End:
+-->
-- 
cgit v1.2.3