From b84ed4f563b3536365f7d3cc4d068407e98685b3 Mon Sep 17 00:00:00 2001
From: Taylan Kammer <taylan.kammer@gmail.com>
Date: Sat, 20 Jun 2026 22:53:50 +0200
Subject: It's a revolution baby.

---
 doc/0/0-value.md       | 199 +++++++++++++++++
 doc/0/1-parse.md       | 595 +++++++++++++++++++++++++++++++++++++++++++++++++
 doc/0/2-decode.md      |  44 ++++
 doc/0/grammar/abnf.txt | 139 ++++++++++++
 doc/0/grammar/index.md | 115 ++++++++++
 doc/0/grammar/peg.txt  |  91 ++++++++
 doc/0/grammar/zbnf.txt |  75 +++++++
 doc/0/index.md         |  34 +++
 8 files changed, 1292 insertions(+)
 create mode 100644 doc/0/0-value.md
 create mode 100644 doc/0/1-parse.md
 create mode 100644 doc/0/2-decode.md
 create mode 100644 doc/0/grammar/abnf.txt
 create mode 100644 doc/0/grammar/index.md
 create mode 100644 doc/0/grammar/peg.txt
 create mode 100644 doc/0/grammar/zbnf.txt
 create mode 100644 doc/0/index.md

(limited to 'doc/0')

diff --git a/doc/0/0-value.md b/doc/0/0-value.md
new file mode 100644
index 0000000..4bb7e0c
--- /dev/null
+++ b/doc/0/0-value.md
@@ -0,0 +1,199 @@
+# NaN-packed Value representation
+
+The format of a binary64 floating-point number, in big-endian notation:
+
+    { sign: 1 bit, exponent: 11 bits, fraction: 52 bits }
+
+When the 11 exponent bits are all set, it's either a NaN or an Infinity.
+
+For value packing, the remaining 53 bits are available, giving us `2^53` values
+minus the following four bit patterns:
+
+    *** FORBIDDEN BIT-PATTERNS ***
+
+    1. Negative cqNaN    :: { sign = 1, exponent = MAX, fraction = 10000... }
+
+    2. Negative Infinity :: { sign = 1, exponent = MAX, fraction = 00000... }
+
+    3. Positive cqNaN    :: { sign = 0, exponent = MAX, fraction = 10000... }
+
+    4. Positive Infinity :: { sign = 0, exponent = MAX, fraction = 00000... }
+
+The abbreviation "cqNaN" stands for canonical quiet NaN.
+
+The MSb of the fraction is also called the `is_quiet` flag, because it marks a
+NaN as being "quiet" rather than signaling.  The rest of the fraction being all
+zero makes it the *canonical* quiet NaN for the given sign value.
+
+The positive and negative cqNaN are the *only* NaN values that can actually be
+returned by FP operations.  This is convenient, because it means we can simply
+use them to represent themselves in Zisp.
+
+Infinity values may also be returned by FP operations, and we want them to also
+exit in Zisp, so they also represent themselves.
+
+Beyond those four bit patterns, all values with a maximum exponent (all bits
+set) are fair game for representing other values, so `2^53 - 4` possibilities.
+
+We split those `2^53 - 4` available values into four groups, each allowing for
+`2^51 - 1` different values to be encoded.  (51-bit values excluding zero.)
+
+    sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1
+
+    sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2
+
+    sign = 0, quiet = 1 :: Pointers and various immediates
+
+    sign = 0, quiet = 0 :: Internal use by interpreter
+
+
+## Fixnums
+
+Negative fixnums actually represent themselves, without needing to go through
+any transformation.  Only the smallest 52-bit signed negative, `-2^51`, cannot
+be represented, as it would step on Forbidden Value #1, Negative cqNaN.
+
+Positive fixnums go through a bitsiwe NOT (which can be implemented as an XOR
+mask combining it with removal of NaN-related high bits) to avoid the all-zero
+payload value, which would step on Forbidden Value #2, Negative Infinity.
+
+
+## Pointers and immediates
+
+This region of 51-bit values is divided as follows, based on the three highest
+bits, providing a payload value of 48 bits for each.
+
+    000   :: Pointer to heap object  (type-tagged)
+
+    001   :: Pointer to list values  (length-tagged)
+
+    010   :: Pointer to istr object
+
+    011   :: Immediate short string
+
+    100   :: Immediate small rational  (sign bit 0)
+
+    101   :: Immediate small rational  (sign bit 1)
+
+    110   :: Undefined
+
+    111   :: Immediate types further subdivided as follows:
+
+             0....... 0....... 0....... (etc.)  :: Rune
+
+             1.......                           :: 128 40-bit types
+
+             0....... 1.......                  :: 16384 32-bit types
+
+             0....... 0....... 1.......         :: 2097152 24-bit types
+
+             (etc.)
+
+Forbidden Value #3, Positive cqNaN, is avoided by not using 0 as a valid type
+tag value for heap pointers.
+
+### Type-tagged heap pointers
+
+Regular heap objects are allocated with 16-byte alignment, meaning the lowest
+four bits are naturally zero.  We exploit this by shifting down the address by
+four bits, making room for more tag bits immediately following the 16 high bits
+that mark the value as a heap pointer.
+
+Thus, a value can be checked against a specific heap object type by comparing
+the 20 high bits to a combined constant value: the 16 highest bits indicating
+that it's a heap pointer, and the 4 bits after that denoting the heap type.
+
+### Length-tagged list pointers
+
+Lists are arrays of Value objects, allocated without padding (8-byte alignment)
+for efficient source code representation and traversal.  Since the lowest three
+bits are naturally zero, we use them as a 3-bit length information tag.
+
+A length tag value of 1 to 7 means there are exactly that many Value objects
+starting at the address, while 0 means there is an array of at least eight,
+terminated with a special 64-bit sentinel bit-pattern that is not otherwise
+valid as a Zisp value.
+
+### Interned string pointers
+
+Interned string (istr) objects may be unaligned, so the low bits of the pointer
+are not used for any special purpose.  Having a separate category for this type
+of pointer also streamlines the interpreter implementation
+
+### Short strings
+
+This 48-bit range is used for strings of zero to six bytes.  These are NUL
+terminated unless exactly six bytes, meaning that a literal NUL byte cannot
+appear in them, but otherwise they allow arbitrary byte values.
+
+### Small rationals
+
+We use a 49-bit space for small rational numbers, with a signed 25-bit two's
+complement integer numerator, and 24-bit unsigned integer denominator.
+
+### Runes and other small values
+
+Runes are symbols of up to 6 ASCII characters in length, used to implement
+extensible reader syntax.  (See Zisp decoder.)  They cannot contain the NUL
+byte, as they are NUL-terminated unless exactly six ASCII bytes in length.
+
+NOTE: The order in which the characters of the rune are encoded depends on
+endianness.  On little-endian systems (i.e. most modern architectures) the
+characters will be in "reverse" order, with the first character in lowest
+position, so the terminating NUL has to be searched from low to high.
+
+The fact that runes are limited to ASCII bytes, whose MSb is unset, opens up
+some space for other small values to co-inhabit the same 48-bit value range.
+
+We divide this space into increasingly many potential types, with smaller and
+smaller payloads, where the highest byte with a non-zero MSb determines which
+size category we're in: If the highest byte has its MSb set, then the other
+seven bits are a type tag, and each type has a 40-bit payload; if the second
+highest byte has its MSb set, then the 14 non-MSb bits of the two high bytes
+define the type, and each has a 32-bit payload; and so on.
+
+Unicode code points need 21 bits, so we use a 24-bit type for the Character
+type.  Miscellaneous values like True, False, EOF, etc. are placed in an 8-bit
+type, since there will never be that many of them.
+
+A virtually unlimited number of user-defined enum types can fit into the types
+with small payload values here: There is room for over 268 Million 16-bit types
+(28-bit type tag) and over 34 Billion 8-bit types (35-bit type tag).
+
+
+## Internal use values
+
+The final 51-bit range is used for various internal purposes by the Zisp
+interpreter, mostly related to transparent code optimization.
+
+  000   :: Pointer to heap object as constant
+
+  001   :: Pointer to list values as constant
+
+  010   :: Pointer to istr object as constant
+
+  011   :: Immediate short string as constant
+
+  100   :: Local variable reference by index
+
+  101   :: Pointer to constant function-call expression
+
+  110   :: Pointer to variable function-call expression
+
+  111   :: Pointer to special-form or macro-call expression
+
+Forbidden Value #4, Positive Infinity, is avoided thanks to the fact that heap
+pointers always have a non-zero heap-type tag.  (See further above.)
+
+The first four categories simply mirror those of the previous 51-bit range, but
+mark the values as being constants rather than code to evaluate.
+
+The remaining four categories are somewhat similar to VM instructions.
+
+
+
+<!--
+;; Local Variables:
+;; fill-column: 80
+;; End:
+-->
diff --git a/doc/0/1-parse.md b/doc/0/1-parse.md
new file mode 100644
index 0000000..101a3b6
--- /dev/null
+++ b/doc/0/1-parse.md
@@ -0,0 +1,595 @@
+# Parser for Code & Data
+
+<!--TOC-->
+
+Zisp s-expressions represent an extremely minimal set of data types; only that
+which is necessary to strategically construct more complex values:
+
+    +---------+--------+----------+------+
+    | String  | Rune   | List     | Nil  |
+    +---------+--------+----------+------+
+    | foobar  | #name  | (X ...)  | ()   |
+    +---------+--------+----------+------+
+
+The parser recognizes various *syntax sugar* which abbreviates verbose syntax,
+and may result in special data structures (typically, a list with a rune in its
+first position) which another Zisp component called the *decoder* can transform
+into a rich set of value types.
+
+More details about syntax sugar, and the decoder, are explained later.
+
+
+## Character Encoding
+
+The parser does not consume Unicode characters; it consumes bytes.  Grammar is
+generally constructed by bytes corresponding to ASCII characters.
+
+Some elements of the grammar, such as comments and quoted strings, may contain
+arbitrary byte sequences, until terminated.  These sequences may happen to be
+valid UTF-8 text.  This way, quoted strings and comments may contain Unicode
+text encoded in UTF-8, but the parser does not check these for validity.
+
+Since comments and quoted strings may contain arbitrary byte sequences, a text
+editor or other program displaying Zisp s-expressions may need to use a special
+visual representation for bytes that don't represent valid text.
+
+The parser working on bytes rather than Unicode characters is not a limitation,
+but rather a feature: It allows Zisp s-expressions to be used as a structured
+data exchange format, which may contain binary data elements, without the need
+to encode these in Base64 or other such text representations of binary data.
+Consider the example:
+
+    ((image.webp "<BINARY>")
+     (video.webm "<BINARY>"))
+
+All that needs to be done for this to work, is that any incidental occurrences
+of the double-quote sign, and the backslash sign, are escaped with a backslash
+within the `<BINARY>` data; all other bytes can appear verbatim in the strings.
+
+
+## Stream Parsing
+
+The parser can be repeatedly invoked on a byte stream to consume the next datum
+within.  This does not require "unreading" or back-seeking within the stream;
+the parser always reads a full datum, and stops after some byte which cleanly
+terminates the currently parsed datum.
+
+This means Zisp s-expressions can be safely intermixed with other data within
+the same byte stream.  So long as the other data is consumed by some parser
+which similarly stops reading at a clear boundary, the Zisp parser can then
+continue operating on the same stream.  Consider the example:
+
+    ("image.webp" 8273)
+
+    << 8273 bytes >>
+
+    ("video.webm" 736)
+
+    << 736 bytes >>
+
+The "header" for each file in this stream is a Zisp s-expression containing
+information about how many bytes should be read after the header, before the
+next file header appears.  (The header data need to be terminated with a blank
+ASCII character such as a newline; the closing parenthesis does not act as a
+terminator unto itself due to the "join" syntax sugar.)
+
+To enable this stream parsing strategy, the parser does not use any automatic
+buffering.  If it did, it might inadvertently consume some bytes beyond the
+currently parsed datum, leaving the stream inconsistent.
+
+If the parser is meant to be used on an input stream associated with expensive
+system calls, such as a file handle or network socket, it's best to wrap that
+stream in some intermediate object which asks the system for large chunks of
+data at once, and stores the data in a buffer.
+
+
+## Comments
+
+Two types of comment are supported: datum comments and line comments.
+
+* A semicolon followed by a tilde instructs the parser to consume one datum and
+  discard it.  Whitespace may appear between the tilde and the datum to discard.
+
+* A semicolon, followed by a non-tilde byte, instructs the parser to consume and
+  discard bytes until a newline (ASCII Line Feed) is encountered.
+
+
+## Value vs. Datum
+
+A Zisp *value* that has an *external representation* in the form of a sequence
+of bytes is called a *datum*.  Every datum is a value, but not every value is a
+datum.  In other words, a datum is a value that can be printed out as a byte
+sequence which the parser can turn back into an equivalent datum.
+
+A value that is not a datum may nevertheless be *encoded* into one, allowing it
+to have an external representation.  After parsing, it needs to be *decoded* to
+actually become the expected value.
+
+One may speak of an *external representation of a value* where the value is not
+itself a datum, but has an encoding as one.  The more strictly correct term for
+this is: "The external representation of the datum encoding the value."
+
+### Syntax sugar
+
+The parser recognizes various *syntax sugar* to abbreviate an equivalent datum
+construction, or express a datum that encodes a more complex value.
+
+As an example, the expression `#(x y z)` is an abbreviation for the equivalent
+`(#HASH x y z)`.  These are two external representations for the same datum;
+after parsing, both will yield values that are indistinguishable in all but
+their memory address.
+
+An example of syntax sugar that is not a mere abbreviation is a quoted string
+which contains bytes that could not appear in a *bare* string:
+
+    "foo bar"  ->  (#DQUOTE <STRING>)
+
+In this example, the visual token `<STRING>` represents the actual string value
+in program memory, which has no direct external representation in bytes because
+it contains a space character.
+
+Those familiar with Lisp and Scheme may expect bare strings to be parsed into a
+separate type called *symbol* while quoted strings are parsed directly into a
+string type, but this is not the case in Zisp.
+
+### Decoder
+
+The *decoder* transforms Zisp data into values of more complex types, including
+values that are not of a datum type.
+
+Combined with syntax sugar, this allows Zisp to offer familiar syntax elements.
+For example, the expression `#(x y z)` which parses into `(#HASH x y z)` can be
+decoded into an array, so the result is similar to the vector syntax of Scheme.
+
+Decoding also resolves datum labels, goes over bare strings to find ones that
+represent a number literal, and takes care of a number of other transforms.
+This offloads complexity, allowing the parser to remain extremely simple.
+
+See the dedicated documentation of the [decoder](2-decode.html) for more.
+
+
+## Data types
+
+Following is a more in-depth explanation of each data type constructed by the
+Zisp s-expression parser.
+
+These are in fact value types, though the term "data type" is often used due to
+familiarity.  A Zisp value that is a member of one of the following value types
+is only a *datum* if it adheres to additional constraints as explained below.
+
+### String
+
+Strings can appear *bare* or be quoted in various ways.  A quoted string is in
+fact parsed into a list value with a rune in the first position to identify the
+quotation variant that was parsed, and the string value in the second position;
+or, in case of at-quoted strings, a special construct we will look at later.
+
+    +-----------+-------------------------------+
+    | Syntax    | Parse output                  |
+    +-----------+-------------------------------+
+    | |bytes|   | (#PQSTR <STRING>)             |
+    +-----------+-------------------------------+
+    | "bytes"   | (#DQSTR <STRING>)             |
+    +-----------+-------------------------------+
+    | @_bytes_  | (#ATSTR <SENTINEL> <STRING>)  |
+    +-----------+-------------------------------+
+
+The visual token `<STRING>` denotes the actual string, as a Zisp value, in the
+second position of the list.  The visual token `<SENTINEL>` stands for a Zisp
+integer value between 0 and 254.
+
+These external representations of strings will be explained in more detail
+further below, including backslash escape sequences allowed within, and how
+exactly at-quoted strings work.
+
+Strings have a fixed length, counted in bytes.  Each byte can have any value,
+including zero (ASCII NUL).  The parser reads bytes, not Unicode characters; a
+string may contain UTF-8 byte sequences, but these are not tested for validity.
+
+A string that is up to 255 bytes long is automatically *interned*, meaning any
+occurrence of the same string -- equal in length and containing the same byte
+values -- ends up being represented by the same bit-pattern; either a memory
+address, or an immediate representation within a CPU word for short strings.
+The quotation method is inconsequential to this process; for example, while
+`|foo bar|` and `"foo bar"` will parse into different list values, the actual
+string they hold a reference to will be the same one in program memory.  This
+behavior is however configurable and can be disabled entirely for cases where
+large numbers of arbitrary binary strings are being parsed.
+
+Strings of length greater than 255 bytes are stored separately in memory, even
+if they are equal in length and content.
+
+### Rune
+
+A rune is represented by an ASCII character sequence of 1 to 6 bytes, that must
+begin with a letter, and may only contain letters and digits.  This character
+sequence of letters and digits is called the *name* of the rune.  A rune that
+follows this constraint is valid as a datum.
+
+Zisp code may explicitly construct values of the rune type that violate the
+above constraints.  Such runes are not valid data and cannot be printed or
+parsed.
+
+Runes are case-sensitive, and the parser always emits runes using upper-case
+letters when expressing syntax sugar.  Uppercase rune names are reserved for
+Zisp's internal use and standard library; users can use lowercase runes with
+custom meaning without worrying about clashes, with the exception of a small
+number of lowercase runes such as `#true` and `#false` that are part of the
+default decoder settings and documented explicitly as such.
+
+Runes are always stored directly in a CPU word; never by memory address.
+
+### List
+
+A list is a contiguous array of one or more values in memory, whose length may
+be encoded directly within the pointer to the head of the array, or else the
+array is terminated with a special sentinel bit-pattern that is not otherwise
+valid as a Zisp value.
+
+The parser allocates a unique array in program memory for every list, and the
+list as a value is then represented by the memory address of that array, with
+either an exact length tag or a tag indicating that it's sentinel-terminated.
+
+Lists are valid data if one of the following holds true:
+
+* The list encodes a quoted string, datum label, or shebang line.
+
+* All values in the list are a valid datum.
+
+Further, a structure of nested list values may not contain cyclic references
+back up in the structure (which would make the above definition diverge into
+infinity).  Such cycles must be broken up with datum labels, or else the list
+cannot be considered a datum, since it cannot be printed or parsed.
+
+### Nil
+
+The Zisp nil value is a singleton and a datum.  There is exactly one nil value,
+used in lieu of a list of zero length; it has the external representation `()`.
+
+
+## Quoted strings
+
+Three quoted string types exist: Pipe-quoted, double-quoted, and at-quoted.
+This section goes into the details of each variant.
+
+### Pipe-quoted
+
+Strings can be quoted with pipes, like symbols in R7RS Scheme, which triggers
+the parser to generate a list with the structure:
+
+    (#PQSTR <STRING>)   ;; <STRING> is visual aid, not syntax
+
+The decoder, using default settings, would emit this string verbatim as a value.
+Then, during code evaluation, this would be seen as an identifier.  In this way,
+pipe-quoted strings are equivalent to bare strings in functionality.
+
+It is important to understand that the decoder sits between the parser and the
+[evaluator](3-eval.html), and in opposition to Lisp and Scheme tradition, it is
+common for the evaluator to receive values that are not valid as a datum; here,
+a string unto itself that may not be a valid datum.  Yet, it is valid as an
+identifier for the purposes of the evaluator.
+
+### Double-quoted
+
+Strings wrapped in the double-quote symbol parse into:
+
+    (#DQSTR <STRING>)   ;; <STRING> is visual aid, not syntax
+
+Under default settings, the decoder would transform this into a value which,
+when evaluated as code, simply yields the contained string as a value.
+
+### At-quoted
+
+This is a special type of syntax for "raw" strings, meaning that no backslash
+escapes nor any other kind of escape sequence are recognized within them.
+
+The syntax begins with an at sign, followed by any byte.  That byte becomes a
+termination marker, and the string cannot contain an occurrence of it, since
+there are no escape sequences.  The byte value 255 has a special meaning; see
+further below.
+
+    @"foo \ bar"  ->  (#ATSTR <SENTINEL> <STRING>)
+
+The visual tokens `<SENTINEL>` and `<STRING>` represent an integer and string
+value, respectively.  Here, the integer would be 34, which is the ASCII value
+for a double-quote sign.  The string contains a literal backslash, since there
+is no backslash escape parsing.
+
+This style of quoting can be useful, for instance, when representing regular
+expressions as strings in code:
+
+    ;; Matches e.g. foo\bar.["blah"]
+
+    @/^foo\\(bar|baz)\.\[".*"\]$/
+
+Were it not for this syntax, this regular expression would only be possible to
+represent through a quoted string such as the following:
+
+    ;; Same as above, but so many backslashes
+
+    "^foo\\\\(bar|baz)\\t\\[\".*\"\\]$"
+
+The byte that follows the at sign need not be a printable character or even a
+valid ASCII byte; it can be absolutely any byte value, even NUL.  This can be
+useful to easily encode binary data which is known to not contain a specific
+byte; an example would be C strings which cannot contain NUL.
+
+If however the byte value is 255, then it does not stand for a sentinel, but
+rather indicates that 6 more bytes follow, interpreted as a big-endian 48-bit
+integer, which is the count of bytes making up the contents of the string.
+
+Example sequence of bytes, represented as a mixture of ASCII and raw integers:
+
+    '@' 255 0 0 0 0 2 100 <612 bytes>  ->  (#ATSTR <STRING>)
+
+One may ask why the length is not included in the list.  This is unnecessary,
+since strings in Zisp already carry length information in their own metadata
+structure.
+    
+
+### Backslash escapes
+
+In pipe-quoted and double-quoted strings, the following ASCII characters may
+follow a backslash to insert a certain character.
+
+    +-------+----------------------------+
+    | Char  | Meaning                    |
+    +-------+----------------------------+
+    | \     | Literal backslash          |
+    +-------+----------------------------+
+    | |     | Literal pipe symbol        |
+    +-------+----------------------------+
+    | "     | Literal double-quote       |
+    +-------+----------------------------+
+    | 0     | ASCII NUL                  |
+    +-------+----------------------------+
+    | a     | ASCII Alert                |
+    +-------+----------------------------+
+    | b     | ASCII Backspace            |
+    +-------+----------------------------+
+    | t     | ASCII Tab (Horizontal)     |
+    +-------+----------------------------+
+    | n     | ASCII Newline (Line Feed)  |
+    +-------+----------------------------+
+    | v     | ASCII Vertical Tab         |
+    +-------+----------------------------+
+    | f     | ASCII Form Feed            |
+    +-------+----------------------------+
+    | r     | ASCII Carriage Return      |
+    +-------+----------------------------+
+    | e     | ASCII Escape               |
+    +-------+----------------------------+
+
+In words:
+
+* A backslash, followed by a backslash, pipe, or double-quote character, is
+  substituted with a literal occurrence of that character.
+
+* The characters 0, a, b, t, n, v, f, r, and e have the same meanings as in the
+  C programming language, representing common ASCII control characters.
+
+Further, the following Regular Expression patterns following a backslash have
+special meaning.
+
+    +---------------------+-----------------------+
+    | Regular Expression  | Meaning               |
+    +---------------------+-----------------------+
+    | [\t ]*\n[\t ]*      | Discarded             |
+    +---------------------+-----------------------+
+    | x([0-9a-fA-F]{2})*; | Arbitrary bytes       |
+    +---------------------+-----------------------+
+    | u[0-9a-fA-F]+;      | Unicode Scalar Value  |
+    +---------------------+-----------------------+
+
+Explanations:
+
+* A backslash followed by any number of blanks (space or tab), a newline, and
+  again any number of blanks, is substituted with nothing.  This is to allow
+  splitting a string into multiple lines for human readability.
+
+      (define p "This paragraph has been visually split into multiple \
+                 lines, but the newline is escaped, so it's one line.")
+
+* An x, followed by pairs of hexadecimal digits (case insensitive), terminated
+  by a semicolon, is substituted with the sequence of bytes represented by the
+  corresponding pairs of hexadecimal digits.  E.g.: `"foo\xDEADBEEF;bar"`
+
+* A u, followed by a hexadecimal digit sequence (case insensitive), terminated
+  by a semicolon, is substituted with the canonical UTF-8 byte sequence for the
+  Unicode Scalar Value represented by that hexadecimal number.  The number must
+  be in the range `0` to `10FFFF`.  E.g.: `"foo\u00A0;bar"`
+
+### Newlines in strings
+
+Normally, a newline in a string has no special meaning and simply becomes part
+of the string.  However, newlines can be backslash-escaped, which simple erases
+them; the escaped newline can also be preceded or followed by any number of tab
+and space characters, which are all stripped as well.  (Note: It's not blanks
+preceding the backslash that are stripped, but blanks following the backslash
+and preceding the newline; i.e., blanks at the end of the line.)
+
+Following are some examples of how multi-line strings can appear in source code
+with different intentions and meanings:
+
+    (define paragraph "This paragraph has been visually split into multiple \
+                       lines, but the newlines are escaped, so it's one line.")
+
+    (define json-object '|   ;; use '|| so double-quotes need no escaping
+      {
+        "key": "value"
+      }
+    |)
+
+The second example is actually slightly problematic.  It begins with a newline,
+which may be undesirable, but escaping that newline would cause the first line
+to have no indentation, thus the opening `{` would not line up with the closing
+`}` when this string is printed out.  Further, if the entire block of code is
+indented, then the string contents may be more indented than intended.  (No pun
+or rhyme intended.)  Consider:
+
+    (let ((foo one))
+      (let ((bar two))
+        (let ((json-object '|
+                 {
+                   "key": "value"
+                 }
+               |))
+          (do-whatever))))
+
+The string bound to `json-object` has redundant indentation.  Should the parser
+attempt to solve this issue?
+
+Thankfully, we have the decoder to handle such complexities.  Under the default
+settings, the rune `#HASH` is bound to a decoder rule which detects a payload
+value that is a string literal, and implements the same algorithm as seen in
+Java 15 Text Blocks: [JEP 378: Text Blocks](https://openjdk.org/jeps/378)
+
+Thus, we can do the following:
+
+    (let ((foo one))
+      (let ((bar two))
+        (let ((json-object #|
+    ...........  {
+    ...........    "key": "value"
+    ...........  }
+    ...........|))
+          (do-whatever))))
+
+(Dots represent whitespace that is deleted.  The initial newline is, as well.)
+
+The only feature Zisp does not offer is a way to fence off multi-line strings
+with a longer token such as `"""` as seen in Python and Java, or an arbitrary
+word as seen in Bourne shell and PHP "here doc" syntax.
+
+However, if a programmer truly wanted to have arbitrary text blocks in code,
+without needing to escape anything in them, it's possible to abuse at-quoted
+string syntax, using it with an ASCII control character which is displayed
+visibly by a text editor.  In the following, the characters `^\` are meant to
+represent a literal ASCII File Separator character in the source code:
+
+    (define json-object #@^\
+      {
+        "key": "value"
+      }
+      ^\)
+
+It works fine in Emacs, so why not?  Use `C-q C-\` to insert the `^\`.
+
+This is indeed quite an eldritch syntax, but hopefully most programs would not
+need to use it.
+
+
+## Other syntax
+
+The following table summarizes commonly useful syntax abbreviations:
+
+    [...]   -> (#SQUARE ...)        #datum       -> (#HASH datum)
+
+    {...}   -> (#BRACE ...)         #rune(...)   -> (#rune ...)
+
+    'datum  -> (#QUOTE datum)       dat1dat2     -> (#JOIN dat1 dat2)
+
+    `datum  -> (#GRAVE datum)       dat1.dat2    -> (#DOT dat1 dat2)
+
+    ,datum  -> (#COMMA datum)       dat1:dat2    -> (#COLON dat1 dat2)
+
+Notes:
+
+* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
+  means zero or more data.
+
+* The `#datum` form only applies when the datum following the hash sign is
+  anything other than a bare string, since otherwise this would be ambiguous
+  with a rune literal.  A bare string can nevertheless follow the hash sign by
+  separating the two with a backslash:
+
+      #\string  ->  (#HASH string)
+
+* Though not represented in the table due to notational difficulty, the form
+  `#rune(...)` doesn't require a list in the second position; any datum that
+  works with the `#datum` syntax also works with `#rune<DATUM>`.
+
+      #rune1#rune2  -> (#rune1 #rune2)
+
+      #rune\string  -> (#rune string)
+
+      #rune'string  -> (#rune (#QUOTE string))
+
+      #rune"string" -> (#rune (#DQSTR |string|))
+
+  As a counter-example, following a rune immediately with a bare string isn't
+  possible without the delimiting backslash, since that would be ambiguous:
+
+      #abcdefgh  ;Could be (#abcdef gh) or (#abcde fgh) or ...
+
+* Syntax sugar can combine arbitrarily.  Some examples follow.  Any of these may
+  or may not actually have a meaning in code; some might simply end up producing
+  an error during decoding, or later evaluation of code.
+
+      #{...}            -> (#HASH (#BRACE ...))
+
+      #'foo             -> (#HASH (#QUOTE foo))
+
+      ##'[...]          -> (#HASH (#HASH (#QUOTE (#SQUARE ...))))
+
+      {x y}[i j]        -> (#JOIN (#BRACE x y) (#SQUARE i j))
+
+      foo.bar.baz{x y}  -> (#JOIN (#DOT (#DOT foo bar) baz) (#BRACE x y))
+
+* Those used to thinking in Lisp and Scheme may think that `(#QUOTE ...)` halts
+  further decoding of enclosed data.  This is not so, since quoting is related
+  to code evaluation, not decoding.
+
+### Datum labels
+
+Valid data cannot be cyclic, since that would mean it has infinite length in
+bytes.  To externally represent a value with cyclic structure, one uses datum
+labels in the data encoding of the value.
+
+A datum label either wraps another datum to assign a number to it, or contains
+just a reference to a previous assignment.
+
+    +------------------+----------------------------+
+    | Syntax           | Internal datum structure   |
+    +------------------+----------------------------+
+    | #%<HEX>=<DATUM>  | (#LABEL <NUMBER> <DATUM>)  |
+    +------------------+----------------------------+
+    | #%<HEX>%         | (#LABEL <NUMBER>)          |
+    +------------------+----------------------------+
+
+In this visual, the token `<HEX>` stands for a hexadecimal digit sequence, the
+token `<DATUM>` stands for any other datum, and `<NUMBER>` is a stand-in for a
+number value; that which is represented by `<HEX>`.
+
+For clarity, concrete examples follow:
+
+    +-------------------+------------------------------+
+    | Byte sequence     | Parse result                 |
+    +-------------------+------------------------------+
+    | #%1234abcd=(foo)  | (#LABEL <0x1234abcd> (foo))  |
+    +-------------------+------------------------------+
+    | #%1234abcd%       | (#LABEL <0x1234abcd>)        |
+    +-------------------+------------------------------+
+
+Here, the visual token `<0x1234abcd>` stands for a Zisp value of a numeric type
+with an integer value.  Note that the decoder may not accept a bare string here,
+meaning this syntax sugar is not merely an abbreviation.
+
+### Shebang
+
+Finally, the parser recognizes the Unix *shebang* syntax and outputs a datum to
+hold the string values found within:
+
+    #!interpreter          ->  (#SHBANG interpreter)
+
+    #!interpreter argline  ->  (#SHBANG interpreter argline)
+
+When executing a script file, Zisp simply stores this into a global value that
+may be inspected if desired.
+
+
+<!--
+;; Local Variables:
+;; fill-column: 80
+;; End:
+-->
diff --git a/doc/0/2-decode.md b/doc/0/2-decode.md
new file mode 100644
index 0000000..1a45824
--- /dev/null
+++ b/doc/0/2-decode.md
@@ -0,0 +1,44 @@
+# Decoding
+
+A separate process called "decoding" can transform simple data structures,
+consisting of only the base datum types, into a richer set of Zisp types.
+
+For example, the decoder may turn `(#HASH ...)` into a vector, as one would
+expect a vector literal like `#(...)` to work in Scheme.  Bytevector syntax
+could use a custom rune as a list prefix, like: `#u8(...)`
+
+Runes may be decoded in isolation as well, rather than transforming a list
+whose head they appear in.  This can implement Boolean constants as `#true`
+and `#false` or `#t` and `#f`.
+
+The decoder recognizes `(#QUOTE ...)` to aid in implementing the traditional
+quoting mechanism of Lisp/Scheme, but with a significant difference:
+
+Traditional quote is "unhygienic" in Scheme terms.  An expression such as
+`'(foo bar)` will always be read as `(quote (foo bar))` regardless of what
+lexical context it appears in, so the semantics will depend on whatever the
+identifier `quote` is bound to, meaning that the expression may end up
+evaluating to something other than the list `(foo bar)`.
+
+The Zisp decoder, which transforms not datum to datum, but object to object,
+can turn `#QUOTE` into an object which encapsulates the notion of quoting,
+which the Zisp evaluator can recognize and act upon, ensuring that an
+expression like `'(foo bar)` always turns into the list `(foo bar)`.
+
+One way to think about this, in Scheme (R6RS / syntax-case) terms, is that
+expressions like `'(foo bar)` turn directly into a syntax object when read,
+and the created syntax object begins with an identifier bound to `quote` in
+the standard library.
+
+The decoder is, of course, configurable and extensible.  The transformations
+mentioned above would be performed only when it's told to decode data which
+represents Zisp code.  The decoder may be given a different configuration,
+telling it to decode, for example, data which represents a different kind of
+domain-specific data, such as application settings, build system commands,
+complex data records with non-standard data types, and so on.
+
+<!--
+;; Local Variables:
+;; fill-column: 80
+;; End:
+-->
diff --git a/doc/0/grammar/abnf.txt b/doc/0/grammar/abnf.txt
new file mode 100644
index 0000000..5ab3c89
--- /dev/null
+++ b/doc/0/grammar/abnf.txt
@@ -0,0 +1,139 @@
+; Standards-compliant ABNF (RFC 5234, RFC 7405)
+
+; Compatible with: https://www.quut.com/abnfgen/
+
+; Unlike PEG, grammar rules in BNF are non-deterministic, which makes
+; it much more challenging to express our naive parse logic.  Whether
+; this ABNF file is truly accurate is difficult to assess.
+
+; The abnfgen(1) tool linked above can be used to generate arbitrary
+; strings matching the grammar in this file.  These can be fed into
+; the Zisp parser to reveal some potential bugs; either in the parser
+; itself, or this ABNF grammar.
+
+; Note that the tool may generate Zisp string literals with Unicode
+; escape sequences corresponding to surrogate code points; the parser
+; may reject these.  This is expected; it's difficult to rewrite this
+; ABNF grammar to exclude those Unicode values.
+
+; Other minor inaccuracies that aren't important include: This ABNF
+; forces line comments to be terminated with an LF character, when in
+; fact the end-of-file may also terminate them; the same applies to
+; hash-bang parsing which doesn't actually have to end in LF.  These
+; discrepancies won't make abnfgen(1) generate invalid strings; they
+; only make this ABNF more strict than the Zisp parser, so it won't
+; generate some strings that the parser would actually accept.
+
+
+Stream        = [ Unit *( Blank Unit ) ] *Blank [Trail]
+
+
+Unit          = *Blank Datum
+
+Blank         = HTAB / LF / %x0b / %x0c / CR / SP / Comment
+
+Trail         = SkipLine / SkipUnit / ";" "~" *Blank
+
+
+Datum         = BareString / SpecialStr / CladDatum / Rune / RuneStr
+              / RuneDotStr / RuneClad / LabelRef / LabelDef / HashStr
+              / HashDotStr / HashClad / QuoteExpr / JoinExpr
+
+Comment       = SkipLine LF / SkipUnit Blank
+
+SkipLine      = ";" [ SkipLStart *AnyButLF ]
+
+SkipUnit      = ";" "~" Unit
+
+SkipLStart    = %x00-09 / %x0b-7d / %x7f-ff ; any but LF or "~"
+
+AnyButLF      = %x00-09 / %x0b-ff
+
+
+BareString    = BareChar *( BareChar / Numeric )
+
+SpecialStr    = SpecStrChar *( SpecStrChar / BareChar )
+
+CladDatum     = "|" *( PipeStrChar / "\" StringEsc ) "|"
+              / DQUOTE *( QuotStrChar / "\" StringEsc ) DQUOTE
+              / "(" List ")"
+              / "[" List "]"
+              / "{" List "}"
+
+Rune          = "#" RuneName
+
+RuneStr       = "#" RuneName "\" BareString
+
+RuneDotStr    = "#" RuneName "\" SpecialStr
+
+RuneClad      = "#" RuneName CladDatum
+
+HashBang      = "#" "!" *( SP / HTAB ) HBLine LF
+
+LabelRef      = "#" "%" Label "%"
+
+LabelDef      = "#" "%" Label "=" Datum
+
+HashStr       = "#" "\" BareString
+
+HashDotStr    = "#" "\" SpecialStr
+
+HashClad      = "#" CladDatum
+
+QuoteExpr     = "'" Datum
+              / "`" Datum
+              / "," Datum
+
+JoinExpr      = Datum RJoinDatum
+              / LJoinDatum NoStartDot
+              / Datum ":" Datum
+              / NoEndDot "." Datum
+
+
+BareChar      = "!" / "$" / "%" / "&" / "*" / "/" / "<" / "=" / ">"
+              / "?" / "^" / "_" / "~" / ALPHA
+
+Numeric       = "+" / "-" / DIGIT
+
+SpecStrChar   = "." / ":" / Numeric
+
+PipeStrChar   = %x00-5b / %x5d-7b / %x7d-ff ; any but "|" or "\"
+
+QuotStrChar   = %x00-21 / %x23-5b / %x5d-ff ; any but DQUOTE or "\"
+
+StringEsc     = "\" / "|" / DQUOTE / *( HTAB / SP ) LF *( HTAB / SP )
+              / %s"a" / %s"b" / %s"t" / %s"n"
+              / %s"v" / %s"f" / %s"r" / %s"e"
+              / %s"x" *( 2HEXDIG ) ";"
+              / %s"u" ["0"] 1*5HEXDIG ";"
+              / %s"u" "1" "0" 4HEXDIG ";"
+
+List          = [ Unit *( Blank Unit ) ] *Blank [SkipUnit]
+
+
+RuneName      = ALPHA *5( ALPHA / DIGIT )
+
+Label         = 1*12( HEXDIG )
+
+HBLine        = 1*HBChar [ 1*( SP / HTAB ) *HBChar ]
+
+HBChar        = %x00-08 / %x0b-1f / %x21-ff ; any but HT, LF, SP
+
+
+RJoinDatum    = CladDatum / Rune / RuneStr / RuneDotStr / RuneClad
+              / LabelRef / LabelDef / HashStr / HashDotStr / HashClad
+              / QuoteExpr
+
+LJoinDatum    = CladDatum / RuneClad / LabelRef / HashClad
+
+NoStartDot    = BareString / CladDatum / Rune / RuneStr / RuneDotStr
+              / RuneClad / LabelRef / LabelDef / HashStr / HashDotStr
+              / HashClad / QuoteExpr
+
+NoEndDot      = BareString / Rune / RuneStr / RuneClad / LabelRef
+              / HashStr / HashClad
+
+
+;; Local Variables:
+;; eval: (flyspell-mode -1)
+;; End:
diff --git a/doc/0/grammar/index.md b/doc/0/grammar/index.md
new file mode 100644
index 0000000..e3716ea
--- /dev/null
+++ b/doc/0/grammar/index.md
@@ -0,0 +1,115 @@
+# Zisp S-Expression Grammar
+
+The grammar is available in several different formats:
+
+* [ZBNF](zbnf.txt): See below for the rules of this notation
+* [ABNF](abnf.txt): Compatible with the `abnfgen` tool
+* [PEG](peg.txt): Compatible with `peg/leg` tool
+
+
+## ZBNF notation
+
+The ZBNF grammar specification uses a BNF-like notation with PEG-like
+semantics:
+
+* Concatenation of expressions is implicit: `foo bar` means `foo`
+  followed by `bar`.
+
+* Parentheses are used for grouping, and the pipe symbol `|` is used
+  for alternatives.
+
+* The suffixes `?`, `*`, and `+` have the same meaning as in regular
+  expressions, although `[foo]` is used in place of `(foo)?`.
+
+* The syntax is defined in terms of bytes, not characters.  Terminals
+  `'c'` and `"c"` refer to the ASCII value of the given character `c`.
+  Standard C escape sequences are supported.
+
+* The prefix `~` means NOT.  It only applies to rules that match one
+  byte, and negates them.  For example, `~( 'a' | 'b' )` matches any
+  byte other than 'a' and 'b'.
+
+* Ranges of terminal values are expressed as `x...y` (inclusive).
+
+* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported.
+
+* There is no ambiguity, or look-ahead / backtracking beyond one byte.
+  Rules match left to right, depth-first, and greedy.  As soon as the
+  input matches the first terminal of a rule --explicit or implied by
+  recursively descending into the first non-terminal-- it must match
+  that rule to the end or a syntax error is reported.
+
+The last point makes the notation simple to translate to code.
+
+
+## Limitations outside the grammar
+
+The following limits are not represented in the grammar:
+
+* A `UnicodeSV` is the hexadecimal representation of a Unicode scalar
+  value; it must represent a value in the range 0 to D7FF, or E000 to
+  10FFFF, inclusive.  Any other value signals an error.  Valid values
+  are converted into a UTF-8 byte sequence encoding the value.
+
+* A `Rune` longer than 6 bytes is grammatical, but signals an error.
+  This is important because runes are not self-terminating; defining
+  their grammar as ending after a maximum of 6 bytes would allow
+  another datum beginning with an alphabetic character to follow a
+  rune immediately without any visual delineation, which would be
+  terribly confusing for a human reader.  Consider: `#foobarbaz`.
+  This would parse as a `Datum` joining `#foobar` and `baz`.
+
+  (The ABNF does not suffer from this issue, since it explicitly
+   enumerates the join possibilities anyway.)
+
+* A `Label` is the hexadecimal representation of a 48-bit integer,
+  meaning it allows for a maximum of 12 hexadecimal digits.  Longer
+  values are grammatical, but signal an out-of-range error, so as to
+  avoid signaling a confusing "invalid character" error on input that
+  appears grammatical.  Consider: `#%123456789abcd=foo`.  This would
+  signal an invalid character error at the letter `d` if the grammar
+  limited a `Label` to 12 hexadecimal digits.
+
+  (As above, the ABNF doesn't care about this.  You probably don't
+   want to use the ABNF to generate a parser anyway.)
+
+
+## At-quoted strings
+
+The mechanism of at-quoted strings is not represented in any of the
+grammars, since it essentially has 256 variants.  Representing it
+sanely in a grammar requires the ability to save and reference
+variables.
+
+
+## Stream-parsing strategy
+
+The parser consumes one `Unit` from the input stream every time it's
+called; it returns the `Datum` therein if found, or else it returns
+the Zisp EOF token.
+
+Since a `Datum` is not self-terminating, the parser must read beyond
+it to realize that it has ended (if not followed by the EOF).  Thus,
+it will consume one more `Blank` following the `Unit` that it parsed.
+If this `Blank` is a comment, it will be consumed entirely, ensuring
+that parsing resumes properly on a subsequent parser call on the same
+input stream, without needing to store any state in between.
+
+Since comments of type `SkipUnit` are likewise not self-terminating,
+an arbitrary number of chained `SkipUnit` comments may need to be
+consumed before the parser is finally allowed to return.
+
+The following illustration shows the positions at which the parser
+will stop consuming input when called repeatedly on the same input
+stream.  The dots represent the extent of each `Unit` being parsed,
+while the caret points at the last byte the parser will consume in
+that parse cycle.
+
+```
+foo (bar)[baz] foo;~bar foo;~bar;~baz;~bat foobar
+...^..........^...     ^...               ^......^
+```
+
+Notice how, in the fourth cycle, the parser is forced to consume all
+commented-out units before it can return, since it would otherwise
+leave the stream in an inappropriate state.
diff --git a/doc/0/grammar/peg.txt b/doc/0/grammar/peg.txt
new file mode 100644
index 0000000..1541da6
--- /dev/null
+++ b/doc/0/grammar/peg.txt
@@ -0,0 +1,91 @@
+# Standard PEG notation
+
+Stream       <- Unit ( Blank Unit )* !.
+
+
+Unit         <- Blank* Datum
+
+Blank        <- [\t-\r ] / Comment
+
+
+Datum        <- OneDatum ( JoinChar? OneDatum )*
+
+JoinChar     <- '.' / ':'
+
+
+Comment      <- ';' ( SkipUnit / SkipLine )
+
+SkipUnit     <- '~' Unit
+
+SkipLine     <- (!'\n' .)* '\n'?
+
+
+OneDatum     <- BareString / CladDatum
+
+
+BareString   <- SpecBareChar ( BareChar / JoinChar )*
+              / BareChar+
+
+SpecBareChar <- '+' / '-' / JoinChar / DIGIT
+
+BareChar     <- ALPHA / DIGIT
+              / '!' / '$' / '%' / '&' / '*' / '+' / '-' / '/'
+              / '<' / '=' / '>' / '?' / '^' / '_' / '~'
+
+
+CladDatum    <- PipeStr / QuoteStr / HashExpr / QuoteExpr / List
+
+PipeStr      <- '|' ( PipeStrChar / '\' StringEsc )* '|'
+QuoteStr     <- '"' ( QuotStrChar / '\' StringEsc )* '"'
+HashExpr     <- '#' HashExprs
+QuoteExpr    <- "'" Datum / '`' Datum / ',' Datum
+List         <- ParenList / SquareList / BraceList
+
+
+PipeStrChar  <- (![|\\] .)
+QuotStrChar  <- (!["\\] .)
+
+StringEsc    <- '\' / '|' / '"' / ( HTAB / SP )* LF ( HTAB / SP )*
+              / '0' / 'a' / 'b' / 't' / 'n' / 'v' / 'f' / 'r' / 'e'
+              / 'x' HexByte* ';'
+              / 'u' UnicodeSV ';'
+
+HexByte      <- HEXDIG HEXDIG
+UnicodeSV    <- HEXDIG+
+
+
+HashExprs    <- '!' [\t ]* HBangLine '\n'?
+              / '%' Label ( '%' / '=' Datum )
+              / '\' BareString / CladDatum
+              / Rune ( '\' BareString / CladDatum )?
+
+HBangLine    <- HBChars+ [\t ]* ( HBChars+ )?
+HBChars      <- (![\t\n ] .)
+Label        <- HEXDIG+
+Rune         <- ALPHA ( ALPHA / DIGIT )*
+
+
+ParenList    <- '(' Unit* ')'
+SquareList   <- '[' Unit* ']'
+BraceList    <- '{' Unit* '}'
+
+
+DIGIT        <- [0-9]
+ALPHA        <- [a-zA-Z]
+HEXDIG       <- [0-9a-fA-F]
+
+
+# Keep this in sync line-for-line with the ZBNF grammar for easy
+# comparison between the two.
+
+# This file is meant to be compatible with:
+# https://piumarta.com/software/peg
+
+# Due to a quirk in the peg tool this file is used with, the grammar
+# must not allow an empty stream.  Therefore, the Unit rule has its
+# Datum declared as mandatory rather than optional.
+
+
+# Local Variables:
+# eval: (flyspell-mode -1)
+# End:
diff --git a/doc/0/grammar/zbnf.txt b/doc/0/grammar/zbnf.txt
new file mode 100644
index 0000000..c04b813
--- /dev/null
+++ b/doc/0/grammar/zbnf.txt
@@ -0,0 +1,75 @@
+; Custom notation with PEG semantics
+
+Stream        : Unit ( Blank Unit )*
+
+
+Unit          : Blank* [Datum]
+
+Blank         : '\t'...'\r' | SP | Comment
+
+
+Datum         : OneDatum ( [JoinChar] OneDatum )*
+
+JoinChar      : '.' | ':'
+
+
+Comment       : ';' ( SkipUnit | SkipLine )
+
+SkipUnit      : '~' Unit
+
+SkipLine      : ( ~LF )* [LF]
+
+
+OneDatum      : BareString | CladDatum
+
+
+BareString    : SpecBareChar ( BareChar | JoinChar )*
+              | BareChar+
+
+SpecBareChar  : '+' | '-' | JoinChar | DIGIT
+
+BareChar      : ALPHA | DIGIT
+              | '!' | '$' | '%' | '&' | '*' | '+' | '-' | '/'
+              | '<' | '=' | '>' | '?' | '^' | '_' | '~'
+
+
+CladDatum     : PipeStr | QuoteStr | HashExpr | QuoteExpr | List
+
+PipeStr       : '|' ( PipeStrChar | '\' StringEsc )* '|'
+QuoteStr      : '"' ( QuotStrChar | '\' StringEsc )* '"'
+HashExpr      : '#' HashExprs
+QuoteExpr     : "'" Datum | '`' Datum | ',' Datum
+List          : ParenList | SquareList | BraceList
+
+
+PipeStrChar   : ~( '|' | '\' )
+QuotStrChar   : ~( '"' | '\' )
+
+StringEsc     : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )*
+              | '0' | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e'
+              | 'x' HexByte* ';'
+              | 'u' UnicodeSV ';'
+
+HexByte       : HEXDIG HEXDIG
+UnicodeSV     : HEXDIG+
+
+
+HashExprs     : '!' ( SP | HTAB )* HBangLine [ LF ]
+              | '%' Label ( '%' | '=' Datum )
+              | '\' BareString | CladDatum
+              | Rune [ '\' BareString | CladDatum ]
+
+HBangLine     : HBChars+ ( SP | HTAB )* [ HBChars+ ]
+HBChars       : ~( SP | HTAB | LF )
+Label         : HEXDIG+
+Rune          : ALPHA ( ALPHA | DIGIT )*
+
+
+ParenList     : '(' Unit* ')'
+SquareList    : '[' Unit* ']'
+BraceList     : '{' Unit* '}'
+
+
+;; Local Variables:
+;; eval: (flyspell-mode -1)
+;; End:
diff --git a/doc/0/index.md b/doc/0/index.md
new file mode 100644
index 0000000..f0da216
--- /dev/null
+++ b/doc/0/index.md
@@ -0,0 +1,34 @@
+# Chapter 0: Genesis
+
+This chapter explains the core value representation of Zisp, and goes
+through the processes of parsing, decoding, running, and optionally
+compiling code.
+
+0. [Value](0-value.html)
+
+   Zisp uses a uniform 64-bit representation for all values, densely
+   packed into signaling or non-canonical NaN values.
+
+1. [Parse](1-parse.html) (see also [grammar](grammar/))
+
+   The parser receives a stream of bytes and transforms them into a
+   minimal set of data types with very little processing.
+
+2. [Decode](2-decode.html)
+
+   The decoder runs configurable and extensible pre-processing steps
+   over data received from the parser, enriching it with more complex
+   value types, and handling primitive source code transforms.
+
+3. [Eval](3-eval.html)
+
+   Code is evaluated within a mutable module context, on which it can
+   have side effects such as creating new definitions or establishing
+   links to other modules.
+
+4. [Compile](4-compile.html)
+
+   Procedures from within the compiler module can be used to demand
+   the compilation of other modules, with various options, yielding
+   static or dynamic object files.  These may be loaded immediately,
+   replacing the previously uncompiled module code in memory.
-- 
cgit v1.2.3