It's a revolution baby.HEAD master

author: Taylan Kammer <taylan.kammer@gmail.com> 2026-06-20 22:53:50 +0200
committer: Taylan Kammer <taylan.kammer@gmail.com> 2026-06-20 22:53:50 +0200
commit: b84ed4f563b3536365f7d3cc4d068407e98685b3 (patch)
tree: 9ab7b18d712db1329b6230cb45520e7c85dc46fd /doc/c1
parent: bfaa74b19fc81dbe071d55566a78a8e329237eff (diff)
7 files changed, 0 insertions, 1108 deletions
diff --git a/doc/c1/1-parse.md b/doc/c1/1-parse.md
deleted file mode 100644
index d4c4c2e..0000000
--- a/doc/c1/1-parse.md
+++ /dev/null
@@ -1,608 +0,0 @@
-# Parser for Code & Data
-
-<!--TOC-->
-
-Zisp s-expressions represent an extremely minimal set of data types; only that
-which is necessary to strategically construct more complex values:
-
-    +-------+---------+--------+----------+------+
-    | TYPE  | String  | Rune   | Pair     | Nil  |
-    +-------+---------+--------+----------+------+
-    | E.G.  | foobar  | #name  | (X & Y)  | ()   |
-    +-------+---------+--------+----------+------+
-
-The parser recognizes various *syntax sugar* which abbreviates verbose syntax,
-and may result in special data structures (typically, a pair with a rune in its
-first, and payload in its second position) which another Zisp component called
-the *decoder* can transform into a rich set of value types.
-
-The most ubiquitous syntax sugar is the list, which abbreviates a sequence of
-tail-linked pairs, terminated with a special nil value represented as `()`:
-
-    (x)      ->  (x & ())
-
-    (x y)    ->  (x & (y & ()))
-
-    (x y z)  ->  (x & (y & (z & ())))
-
-The following are so-called *improper lists*, ending in a non-nil value:
-
-    (x y & z)    ->  (x & (y & z))
-
-    (x y z & t)  ->  (x & (y & (z & t)))
-
-More details about syntax sugar, and the decoder, are explained later.
-
-
-## Character Encoding
-
-The parser does not consume Unicode characters; it consumes bytes.  Grammar is
-generally constructed by bytes corresponding to ASCII characters.
-
-Some elements of the grammar, such as comments and quoted strings, may contain
-arbitrary byte sequences, until terminated.  These sequences may happen to be
-valid UTF-8 text.  This way, quoted strings and comments may contain Unicode
-text encoded in UTF-8, but the parser does not check these for validity.
-
-Since comments and quoted strings may contain arbitrary byte sequences, a text
-editor or other program displaying Zisp s-expressions may need to use a special
-visual representation for bytes that don't represent valid text.
-
-The parser working on bytes rather than Unicode characters is not a limitation,
-but rather a feature: It allows Zisp s-expressions to be used as a structured
-data exchange format, which may contain binary data elements, without the need
-to encode these in Base64 or other such text representations of binary data.
-Consider the example:
-
-    ((image.webp "<BINARY>")
-     (video.webm "<BINARY>"))
-
-All that needs to be done for this to work, is that any incidental occurrences
-of the double-quote sign, and the backslash sign, are escaped with a backslash
-within the `<BINARY>` data; all other bytes can appear verbatim in the strings.
-
-
-## Stream Parsing
-
-The parser can be repeatedly invoked on a byte stream to consume the next datum
-within.  This does not require "unreading" or back-seeking within the stream;
-the parser always reads a full datum, and stops after some byte which cleanly
-terminates the currently parsed datum.
-
-This means Zisp s-expressions can be safely intermixed with other data within
-the same byte stream.  So long as the other data is consumed by some parser
-which similarly stops reading at a clear boundary, the Zisp parser can then
-continue operating on the same stream.  Consider the example:
-
-    ("image.webp" 8273)
-
-    << 8273 bytes >>
-
-    ("video.webm" 736)
-
-    << 736 bytes >>
-
-The "header" for each file in this stream is a Zisp s-expression containing
-information about how many bytes should be read after the header, before the
-next file header appears.  (The header data need to be terminated with a blank
-ASCII character such as a newline; the closing parenthesis does not act as a
-terminator unto itself due to the "join" syntax sugar.)
-
-To enable this stream parsing strategy, the parser does not use any automatic
-buffering.  If it did, it might inadvertently consume some bytes beyond the
-currently parsed datum, leaving the stream inconsistent.
-
-If the parser is meant to be used on an input stream associated with expensive
-system calls, such as a file handle or network socket, it's best to wrap that
-stream in some intermediate object which asks the system for large chunks of
-data at once, and stores the data in a buffer.
-
-
-## Comments
-
-Two types of comment are supported: datum comments and line comments.
-
-* A semicolon followed by a tilde instructs the parser to consume one datum and
-  discard it.  Whitespace may appear between the tilde and the datum to discard.
-
-* A semicolon, followed by a non-tilde byte, instructs the parser to consume and
-  discard bytes until a newline (ASCII Line Feed) is encountered.
-
-
-## Value vs. Datum
-
-A Zisp *value* that has an *external representation* in the form of a sequence
-of bytes is called a *datum*.  Every datum is a value, but not every value is a
-datum.  In other words, a datum is a value that can be printed out as a byte
-sequence which the parser can turn back into an equivalent datum.
-
-A value that is not a datum may nevertheless be *encoded* into one, allowing it
-to have an external representation.  After parsing, it needs to be *decoded* to
-actually become the expected value.
-
-One may speak of an *external representation of a value* where the value is not
-itself a datum, but can be encoded as one.  The more strictly correct term for
-this is: "The external representation of a datum that encodes the value."
-
-### Syntax sugar
-
-The parser recognizes various *syntax sugar* to abbreviate an equivalent datum
-construction, or express a datum that encodes a more complex value.
-
-As an example, the expression `#(x y z)` is an abbreviation for the equivalent
-`(#HASH x y z)`.  These are two external representations for the same datum;
-after parsing, both will yield values that are indistinguishable in all but
-their memory address.
-
-An example of syntax sugar that is not a mere abbreviation is a quoted string
-which contains bytes that could not appear in a *bare* string:
-
-    "foo bar"  ->  (#DQUOTE & <STRING>)
-
-In this example, the visual token `<STRING>` represents the actual string value
-in program memory, which has no direct external representation in bytes because
-it contains a space character.
-
-Those familiar with Lisp and Scheme may expect bare strings to be parsed into a
-separate type called *symbol* while quoted strings are parsed directly into a
-string type, but this is not the case in Zisp.
-
-### Decoder
-
-The *decoder* transforms Zisp data into values of more complex types, including
-values that are not of a datum type.
-
-Combined with syntax sugar, this allows Zisp to offer familiar syntax elements.
-For example, the expression `#(x y z)` which parses into `(#HASH x y z)` can be
-decoded into an array, so the result is similar to the vector syntax of Scheme.
-
-Decoding also resolves datum labels, goes over bare strings to find ones that
-represent a number literal, and takes care of a number of other transforms.
-This offloads complexity, allowing the parser to remain extremely simple.
-
-See the dedicated documentation of the [decoder](2-decode.html) for more.
-
-
-## Data types
-
-Following is a more in-depth explanation of each data type constructed by the
-Zisp s-expression parser.
-
-These are in fact value types, though the term "data type" is often used due to
-familiarity.  A Zisp value that is a member of one of the following value types
-is only a *datum* if it adheres to additional constraints as explained below.
-
-### String
-
-Strings can appear *bare* or be quoted in various ways.  A quoted string is in
-fact parsed into a pair value with a rune in the first position to identify the
-quotation variant that was parsed, and the string value in the second position;
-or, in case of at-quoted strings, a special construct we will look at later.
-
-    +-----------+-----------------------------+
-    | Syntax    | Parse output                |
-    +-----------+-----------------------------+
-    | |bytes|   | (#PQSTR & <STRING>)         |
-    +-----------+-----------------------------+
-    | "bytes"   | (#DQSTR & <STRING>)         |
-    +-----------+-----------------------------+
-    | @_bytes_  | (#ATSTR <BYTE> & <STRING>)  |
-    +-----------+-----------------------------+
-
-The visual token `<STRING>` denotes the actual string, as a Zisp value, in the
-second position of the pair.  The visual token `<BYTE>` stands for an integer
-Zisp value between 0 and 255.
-
-These external representations of strings will be explained in more detail
-further below, including backslash escape sequences allowed within, and how
-exactly at-quoted strings work.
-
-Strings have a fixed length, counted in bytes.  Each byte can have any value,
-including zero (ASCII NUL).  The parser reads bytes, not Unicode characters; a
-string may contain UTF-8 byte sequences, but these are not tested for validity.
-
-A string that is up to 255 bytes long is automatically *interned*, meaning any
-occurrence of the same string -- equal in length and containing the same byte
-values -- ends up being represented by the same bit-pattern; either a memory
-address, or an immediate representation within a CPU word for short strings.
-The quotation method is inconsequential to this process; for example, while
-`|foobar|` and `"foobar"` will parse into different pair values, the actual
-string they hold will be the same one in program memory.
-
-Strings of length greater than 255 bytes are stored separately in memory, even
-if they are equal in length and content.
-
-### Rune
-
-A rune is represented by an ASCII character sequence of 1 to 6 bytes, that must
-begin with a letter, and may only contain letters and digits.  This character
-sequence of letters and digits is called the *name* of the rune.  A rune that
-follows this constraint is valid as a datum.
-
-Zisp code may explicitly construct values of the rune type that violate the
-above constraints.  Such runes are not valid data and cannot be printed or
-parsed.
-
-Runes are case-sensitive, and the parser always emits runes using upper-case
-letters when expressing syntax sugar.  Uppercase rune names are reserved for
-Zisp's internal use and standard library; users can use lowercase runes with
-custom meaning without worrying about clashes, with the exception of a small
-number of lowercase runes such as `#true` and `#false` that are part of the
-default decoder settings and documented explicitly as such.
-
-Runes are always stored directly in a CPU word; never by memory address.
-
-### Pair
-
-A pair is a tuple of two values: the first value and the second value.  In Lisp
-tradition, these are also called the `car` and `cdr` of the pair, respectively.
-
-The parser allocates a unique two-word cell in program memory for every pair,
-and represents that pair through the memory address of the cell.
-
-Pairs are valid data if one of the following holds true:
-
-* The pair encodes a quoted string, datum label, or shebang line.
-
-* Both the first and second value in the pair is a valid datum.
-
-Further, a structure of nested pair values may not contain cyclic references
-back up in the structure (which would make the above definition diverge into
-infinity).  Such cycles must be broken up with datum labels, or else the pair
-cannot be considered a datum, since it cannot be printed or parsed.
-
-### Nil
-
-The Zisp nil value is a singleton and a datum.  There is exactly one nil value
-and it is used to terminate a chain of pairs representing a list of values; it
-has the external representation `()`.
-
-
-## Quoted strings
-
-Three quoted string types exist: Pipe-quoted, double-quoted, and at-quoted.
-This section goes into the details of each variant.
-
-### Pipe-quoted
-
-Strings can be quoted with pipes, like symbols in R7RS Scheme, which triggers
-the parser to generate a pair with the structure:
-
-    (#PQSTR & <STRING>)   ;; <STRING> is visual aid, not syntax
-
-The decoder, using default settings, would emit this string verbatim as a value.
-Then, during code evaluation, this would be seen as an identifier.  In this way,
-pipe-quoted strings are equivalent to bare strings in functionality.
-
-It is important to understand that the decoder sits between the parser and the
-[evaluator](3-execute.html), and in opposition to Lisp and Scheme tradition, it
-is common for the evaluator to receive values that are not valid as a datum; in
-this case, a string unto itself that may not be a valid datum, due to not being
-possible to be represented as a bare string.  Yet, it is valid as an identifier
-for the purposes of the evaluator, since it is a string *value* like any other.
-
-### Double-quoted
-
-Strings wrapped in the double-quote symbol parse into:
-
-    (#DQSTR & <STRING>)   ;; <STRING> is visual aid, not syntax
-
-Under default settings, the decoder would transform this into a value which,
-when evaluated as code, simply yields the contained string as a value.
-
-### At-quoted
-
-This is a special type of syntax for "raw" strings, meaning that no backslash
-escapes nor any other kind of escape sequence are recognized within them.
-
-The syntax begins with an at sign, followed by any byte.  That byte becomes a
-termination marker, and the string cannot contain an occurrence of it, since
-there are no escape sequences.
-
-    @"foo \ bar"  ->  (#ATSTR <BYTE> & <STRING>)
-
-In the above, the visual tokens `<BYTE>` and `<STRING>` represent an integer
-value and a string value, respectively.  In this example, the integer value
-would be 34; the ASCII value for the double-quote sign.  The string value
-contains a literal backslash, since there is no backslash escape parsing.
-
-This style of quoting can be useful, for instance, when representing regular
-expressions as strings in code:
-
-    ;; Matches e.g. foo\bar.["blah"]
-
-    @/^foo\\(bar|baz)\.\[".*"\]$/
-
-Were it not for this syntax, this regular expression would only be possible to
-represent through a quoted string such as the following:
-
-    ;; Same as above, but so many backslashes
-
-    "^foo\\\\(bar|baz)\\t\\[\".*\"\\]$"
-
-The byte that follows the at sign need not be a printable character or even a
-valid ASCII byte; it can be absolutely any byte value, even NUL.  This can be
-useful to easily encode binary data which is known to not contain a specific
-byte; an example would be C strings which cannot contain NUL.
-
-### Backslash escapes
-
-In pipe-quoted and double-quoted strings, the following ASCII characters may
-follow a backslash to insert a certain character.
-
-    +-------+----------------------------+
-    | Char  | Meaning                    |
-    +-------+----------------------------+
-    | \     | Literal backslash          |
-    +-------+----------------------------+
-    | |     | Literal pipe symbol        |
-    +-------+----------------------------+
-    | "     | Literal double-quote       |
-    +-------+----------------------------+
-    | 0     | ASCII NUL                  |
-    +-------+----------------------------+
-    | a     | ASCII Alert                |
-    +-------+----------------------------+
-    | b     | ASCII Backspace            |
-    +-------+----------------------------+
-    | t     | ASCII Tab (Horizontal)     |
-    +-------+----------------------------+
-    | n     | ASCII Newline (Line Feed)  |
-    +-------+----------------------------+
-    | v     | ASCII Vertical Tab         |
-    +-------+----------------------------+
-    | f     | ASCII Form Feed            |
-    +-------+----------------------------+
-    | r     | ASCII Carriage Return      |
-    +-------+----------------------------+
-    | e     | ASCII Escape               |
-    +-------+----------------------------+
-
-In words:
-
-* A backslash followed by a backslash, pipe, or double-quote character is
-  substituted with a literal occurrence of that character.
-
-* The characters 0, a, b, t, n, v, f, r, and e have the same meanings as in the
-  C programming language, representing common ASCII control characters.
-
-Further, the following Regular Expression patterns following a backslash have
-special meaning.
-
-    +---------------------+-----------------------+
-    | Regular Expression  | Meaning               |
-    +---------------------+-----------------------+
-    | [\t ]*\n[\t ]*      | Discarded             |
-    +---------------------+-----------------------+
-    | x([0-9a-fA-F]{2})*; | Arbitrary bytes       |
-    +---------------------+-----------------------+
-    | u[0-9a-fA-F]+;      | Unicode Scalar Value  |
-    +---------------------+-----------------------+
-
-Explanations:
-
-* A backslash followed by any number of blanks (space or tab), a newline, and
-  again any number of blanks, is substituted with nothing.  This is to allow
-  splitting a string into multiple lines for human readability.
-
-      (define p "This paragraph has been visually split into multiple \
-                 lines, but the newline is escaped, so it's one line.")
-
-* An x, followed by pairs of hexadecimal digits (case insensitive), terminated
-  by a semicolon, is substituted with the sequence of bytes represented by the
-  corresponding pairs of hexadecimal digits.  E.g.: `"foo\xDEADBEEF;bar"`
-
-* A u, followed by a hexadecimal digit sequence (case insensitive), terminated
-  by a semicolon, is substituted with the canonical UTF-8 byte sequence for the
-  Unicode Scalar Value represented by that hexadecimal number.  The number must
-  be in the range `0` to `10FFFF`.  E.g.: `"foo\u00A0;bar"`
-
-### Newlines in strings
-
-Normally, a newline in a string has no special meaning and simply becomes part
-of the string.  However, newlines can be backslash-escaped, which simple erases
-them; the escaped newline can also be preceded or followed by any number of tab
-and space characters, which are all stripped as well.  (Note: It's not blanks
-preceding the backslash that are stripped, but blanks following the backslash
-and preceding the newline; i.e., blanks at the end of the line.)
-
-Following are some examples of how multi-line strings can appear in source code
-with different intentions and meanings:
-
-    (define paragraph "This paragraph has been visually split into multiple \
-                       lines, but the newlines are escaped, so it's one line.")
-
-    (define json-object '|   ;; use '|| so double-quotes need no escaping
-      {
-        "key": "value"
-      }
-    |)
-
-The second example is actually slightly problematic.  It begins with a newline,
-which may be undesirable, but escaping that newline would cause the first line
-to have no indentation, thus the opening `{` would not line up with the closing
-`}` when this string is printed out.  Further, if the entire block of code is
-indented, then the string contents may be more indented than intended.  (No pun
-or rhyme intended.)  Consider:
-
-    (let ((foo one))
-      (let ((bar two))
-        (let ((json-object '|
-                 {
-                   "key": "value"
-                 }
-               |))
-          (do-whatever))))
-
-The string bound to `json-object` has redundant indentation.  Should the parser
-attempt to solve this issue?
-
-Thankfully, we have the decoder to handle such complexities.  Under the default
-settings, the rune `#HASH` is bound to a decoder rule which detects a payload
-value that is a string literal, and implements the same algorithm as seen in
-Java 15 Text Blocks: [JEP 378: Text Blocks](https://openjdk.org/jeps/378)
-
-Thus, we can do the following:
-
-    (let ((foo one))
-      (let ((bar two))
-        (let ((json-object #|
-    ...........  {
-    ...........    "key": "value"
-    ...........  }
-    ...........|))
-          (do-whatever))))
-
-(Dots represent whitespace that is deleted.  The initial newline is, as well.)
-
-The only feature Zisp does not offer is a way to fence off multi-line strings
-with a longer token such as `"""` as seen in Python and Java, or an arbitrary
-word as seen in Bourne shell and PHP "here doc" syntax.
-
-However, if a programmer truly wanted to have arbitrary text blocks in code,
-without needing to escape anything in them, it's possible to abuse at-quoted
-string syntax, using it with an ASCII control character which is displayed
-visibly by a text editor.  In the following, the characters `^\` are meant to
-represent a literal ASCII File Separator character in the source code:
-
-    (define json-object #@^\
-      {
-        "key": "value"
-      }
-      ^\)
-
-It works fine in Emacs, so why not?  Use `C-q C-\` to insert the `^\`.
-
-This is indeed quite an eldritch syntax, but hopefully most programs would not
-need to use it.
-
-
-## Other syntax
-
-The following table summarizes commonly useful syntax abbreviations:
-
-    [...]   -> (#SQUARE ...)          #datum       -> (#HASH & datum)
-
-    {...}   -> (#BRACE ...)           #rune(...)   -> (#rune ...)
-
-    'datum  -> (#QUOTE & datum)       dat1dat2     -> (#JOIN dat1 & dat2)
-
-    `datum  -> (#GRAVE & datum)       dat1.dat2    -> (#DOT dat1 & dat2)
-
-    ,datum  -> (#COMMA & datum)       dat1:dat2    -> (#COLON dat1 & dat2)
-
-Notes:
-
-* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
-  means zero or more data.
-
-* The `#datum` form only applies when the datum following the hash sign is
-  anything other than a bare string, since otherwise this would be ambiguous
-  with a rune literal.  A bare string can nevertheless follow the hash sign by
-  separating the two with a backslash:
-
-      #\string  ->  (#HASH & string)
-
-* Though not represented in the table due to notational difficulty, the form
-  `#rune(...)` doesn't require a list in the second position; any datum that
-  works with the `#datum` syntax also works with `#rune<DATUM>`.
-
-      #rune1#rune2  -> (#rune1 & #rune2)
-
-      #rune\string  -> (rune & string)
-
-      #rune'string  -> (#rune #QUOTE & string)
-
-      #rune"string" -> (#rune #DQSTR & |string|)
-
-  As a counter-example, following a rune immediately with a bare string isn't
-  possible without the delimiting backslash, since that would be ambiguous:
-
-      #abcdefgh  ;Could be (#abcdef & gh) or (#abcde & fgh) or ...
-
-* Syntax sugar can combine arbitrarily.  Some examples follow.  Any of these may
-  or may not actually have a meaning in code; many could simply end up producing
-  an error during decoding, or later evaluation of code.
-
-      #{...}            -> (#HASH #BRACE ...)
-
-      #'foo             -> (#HASH #QUOTE & foo)
-
-      ##'[...]          -> (#HASH #HASH #QUOTE #SQUARE ...)
-
-      {x y}[i j]        -> (#JOIN (#BRACE x y) #SQUARE i j)
-
-      foo.bar.baz{x y}  -> (#JOIN (#DOT (#DOT foo & bar) & baz) #BRACE x y)
-
-* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses as
-  `(#QUOTE & foo)`; a single pair with the quoted datum in the second position.
-
-  The same principle is used when parsing other sugar; some examples follow:
-
-      Incorrect                              Correct
-
-      #(x y z) -> (#HASH (x y z))            #(x y z) -> (#HASH x y z)
-
-      [x y z]  -> (#SQUARE (x y z))          [x y z]  -> (#SQUARE x y z)
-
-      #{x}     -> (#HASH (#BRACE (x)))       #{x}     -> (#HASH #BRACE x)
-
-      foo(x y) -> (#JOIN foo (x y))          foo(x y) -> (#JOIN foo x y)
-
-* Those used to thinking in Lisp and Scheme may think that `(#QUOTE ...)` halts
-  further decoding of enclosed data.  This is not so, since quoting is related
-  to code evaluation, not decoding.
-
-### Datum labels
-
-Valid data cannot be cyclic, since that would mean it has infinite length in
-bytes.  To externally represent a value with cyclic structure, one uses datum
-labels in the data encoding of the value.
-
-A datum label either wraps another datum to assign a number to it, or contains
-just a reference to a previous assignment.
-
-    +------------------+------------------------------+
-    | Syntax           | Internal datum structure     |
-    +------------------+------------------------------+
-    | #%<HEX>=<DATUM>  | (#LABEL <NUMBER> & <DATUM>)  |
-    +------------------+------------------------------+
-    | #%<HEX>%         | (#LABEL & <NUMBER>)          |
-    +------------------+------------------------------+
-
-In this visual, the token `<HEX>` stands for a hexadecimal digit sequence, the
-token `<DATUM>` stands for any other datum, and `<NUMBER>` is a stand-in for a
-number value; that which is represented by `<HEX>`.
-
-For clarity, concrete examples follow:
-
-    +-------------------+-------------------------------+
-    | Byte sequence     | Parse result                  |
-    +-------------------+-------------------------------+
-    | #%1234abcd=(foo)  | (#LABEL <0x1234abcd> & (foo)) |
-    +-------------------+-------------------------------+
-    | #%1234abcd%       | (#LABEL & <0x1234abcd>)       |
-    +-------------------+-------------------------------+
-
-Here, the visual token `<0x1234abcd>` stands for a Zisp value of a numeric type
-with an integer value.  Note that the decoder may not accept a bare string here,
-meaning this syntax sugar is not merely an abbreviation.
-
-### Shebang
-
-Finally, the parser recognizes the Unix *shebang* syntax and outputs a datum to
-hold the string values found within:
-
-    #!interpreter          ->  (#SHBANG & interpreter)
-
-    #!interpreter argline  ->  (#SHBANG interpreter & argline)
-
-When executing a script file, Zisp simply stores this into a global value that
-may be inspected if desired.
-
-
-<!--
-;; Local Variables:
-;; fill-column: 80
-;; End:
--->
diff --git a/doc/c1/2-decode.md b/doc/c1/2-decode.md
deleted file mode 100644
index 379c74b..0000000
--- a/doc/c1/2-decode.md
+++ /dev/null
@@ -1,44 +0,0 @@
-# Decoding
-
-A separate process called "decoding" can transform simple data structures,
-consisting of only the base datum types, into a richer set of Zisp types.
-
-For example, the decoder may turn `(#HASH ...)` into a vector, as one would
-expect a vector literal like `#(...)` to work in Scheme.  Bytevector syntax
-could use a custom rune as a list prefix, like: `#u8(...)`
-
-Runes may be decoded in isolation as well, rather than transforming a list
-whose head they appear in.  This can implement Boolean constants as `#true`
-and `#false` or `#t` and `#f`.
-
-The decoder recognizes `(#QUOTE ...)` to aid in implementing the traditional
-quoting mechanism of Lisp/Scheme, but with a significant difference:
-
-Traditional quote is "unhygienic" in Scheme terms.  An expression such as
-`'(foo bar)` will always be read as `(quote (foo bar))` regardless of what
-lexical context it appears in, so the semantics will depend on whatever the
-identifier `quote` is bound to, meaning that the expression may end up
-evaluating to something other than the list `(foo bar)`.
-
-The Zisp decoder, which transforms not datum to datum, but object to object,
-can turn `#QUOTE` into an object which encapsulates the notion of quoting,
-which the Zisp evaluator can recognize and act upon, ensuring that an
-expression like `'(foo bar)` always turns into the list `(foo bar)`.
-
-One way to think about this, in Scheme (R6RS / syntax-case) terms, is that
-expressions like `'(foo bar)` turn directly into a syntax object when read,
-and the created syntax object begins with an identifier bound to `quote` in
-the standard library.
-
-The decoder is, of course, configurable and extensible.  The transformations
-mentioned above would be performed only when it's told to decode data which
-represents Zisp code.  The decoder may be given a different configuration,
-telling it to decode, for example, data which represents a different kind of
-domain-specific data, such as application settings, build system commands,
-complex data records with non-standard data types, and so on.
-
-<!--
-;; Local Variables:
-;; fill-column: 77
-;; End:
--->
diff --git a/doc/c1/grammar/abnf.txt b/doc/c1/grammar/abnf.txt
deleted file mode 100644
index aa67646..0000000
--- a/doc/c1/grammar/abnf.txt
+++ /dev/null
@@ -1,141 +0,0 @@
-; Standards-compliant ABNF (RFC 5234, RFC 7405)
-
-; Compatible with: https://www.quut.com/abnfgen/
-
-; Unlike PEG, grammar rules in BNF are non-deterministic, which makes
-; it much more challenging to express our naive parse logic.  Whether
-; this ABNF file is truly accurate is difficult to assess.
-
-; The abnfgen(1) tool linked above can be used to generate arbitrary
-; strings matching the grammar in this file.  These can be fed into
-; the Zisp parser to reveal some potential bugs; either in the parser
-; itself, or this ABNF grammar.
-
-; Note that the tool may generate Zisp string literals with Unicode
-; escape sequences corresponding to surrogate code points; the parser
-; may reject these.  This is expected; it's difficult to rewrite this
-; ABNF grammar to exclude those Unicode values.
-
-; Other minor inaccuracies that aren't important include: This ABNF
-; forces line comments to be terminated with an LF character, when in
-; fact the end-of-file may also terminate them; the same applies to
-; hash-bang parsing which doesn't actually have to end in LF.  These
-; discrepancies won't make abnfgen(1) generate invalid strings; they
-; only make this ABNF more strict than the Zisp parser, so it won't
-; generate some strings that the parser would actually accept.
-
-
-Stream        = [ Unit *( Blank Unit ) ] *Blank [Trail]
-
-
-Unit          = *Blank Datum
-
-Blank         = HTAB / LF / %x0b / %x0c / CR / SP / Comment
-
-Trail         = SkipLine / SkipUnit / ";" "~" *Blank
-
-
-Datum         = BareString / SpecialStr / CladDatum / Rune / RuneStr
-              / RuneDotStr / RuneClad / LabelRef / LabelDef / HashStr
-              / HashDotStr / HashClad / QuoteExpr / JoinExpr
-
-Comment       = SkipLine LF / SkipUnit Blank
-
-SkipLine      = ";" [ SkipLStart *AnyButLF ]
-
-SkipUnit      = ";" "~" Unit
-
-SkipLStart    = %x00-09 / %x0b-7d / %x7f-ff ; any but LF or "~"
-
-AnyButLF      = %x00-09 / %x0b-ff
-
-
-BareString    = BareChar *( BareChar / Numeric )
-
-SpecialStr    = SpecStrChar *( SpecStrChar / BareChar )
-
-CladDatum     = "|" *( PipeStrChar / "\" StringEsc ) "|"
-              / DQUOTE *( QuotStrChar / "\" StringEsc ) DQUOTE
-              / "(" List ")"
-              / "[" List "]"
-              / "{" List "}"
-
-Rune          = "#" RuneName
-
-RuneStr       = "#" RuneName "\" BareString
-
-RuneDotStr    = "#" RuneName "\" SpecialStr
-
-RuneClad      = "#" RuneName CladDatum
-
-HashBang      = "#" "!" *( SP / HTAB ) HBLine LF
-
-LabelRef      = "#" "%" Label "%"
-
-LabelDef      = "#" "%" Label "=" Datum
-
-HashStr       = "#" "\" BareString
-
-HashDotStr    = "#" "\" SpecialStr
-
-HashClad      = "#" CladDatum
-
-QuoteExpr     = "'" Datum
-              / "`" Datum
-              / "," Datum
-
-JoinExpr      = Datum RJoinDatum
-              / LJoinDatum NoStartDot
-              / Datum ":" Datum
-              / NoEndDot "." Datum
-
-
-BareChar      = "!" / "$" / "%" / "*" / "/" / "<" / "=" / ">"
-              / "?" / "^" / "_" / "~" / ALPHA
-
-Numeric       = "+" / "-" / DIGIT
-
-SpecStrChar   = "." / ":" / Numeric
-
-PipeStrChar   = %x00-5b / %x5d-7b / %x7d-ff ; any but "|" or "\"
-
-QuotStrChar   = %x00-21 / %x23-5b / %x5d-ff ; any but DQUOTE or "\"
-
-StringEsc     = "\" / "|" / DQUOTE / *( HTAB / SP ) LF *( HTAB / SP )
-              / %s"a" / %s"b" / %s"t" / %s"n"
-              / %s"v" / %s"f" / %s"r" / %s"e"
-              / %s"x" *( 2HEXDIG ) ";"
-              / %s"u" ["0"] 1*5HEXDIG ";"
-              / %s"u" "1" "0" 4HEXDIG ";"
-
-List          = [ Unit *( Blank Unit ) ] *Blank [Tail] [SkipUnit]
-
-Tail          = "&" Unit *Blank
-
-
-RuneName      = ALPHA *5( ALPHA / DIGIT )
-
-Label         = 1*12( HEXDIG )
-
-HBLine        = 1*HBChar [ 1*( SP / HTAB ) *HBChar ]
-
-HBChar        = %x00-08 / %x0b-1f / %x21-ff ; any but HT, LF, SP
-
-
-RJoinDatum    = CladDatum / Rune / RuneStr / RuneDotStr / RuneClad
-              / LabelRef / LabelDef / HashStr / HashDotStr / HashClad
-              / QuoteExpr
-
-LJoinDatum    = CladDatum / RuneClad / LabelRef / HashClad
-
-NoStartDot    = BareString / CladDatum / Rune / RuneStr / RuneDotStr
-              / RuneClad / LabelRef / LabelDef / HashStr / HashDotStr
-              / HashClad / QuoteExpr
-
-NoEndDot      = BareString / Rune / RuneStr / RuneClad / LabelRef
-              / HashStr / HashClad
-
-
-;; Local Variables:
-;; eval: (flyspell-mode -1)
-;; End:
diff --git a/doc/c1/grammar/index.md b/doc/c1/grammar/index.md
deleted file mode 100644
index e3716ea..0000000
--- a/doc/c1/grammar/index.md
+++ /dev/null
@@ -1,115 +0,0 @@
-# Zisp S-Expression Grammar
-
-The grammar is available in several different formats:
-
-* [ZBNF](zbnf.txt): See below for the rules of this notation
-* [ABNF](abnf.txt): Compatible with the `abnfgen` tool
-* [PEG](peg.txt): Compatible with `peg/leg` tool
-
-
-## ZBNF notation
-
-The ZBNF grammar specification uses a BNF-like notation with PEG-like
-semantics:
-
-* Concatenation of expressions is implicit: `foo bar` means `foo`
-  followed by `bar`.
-
-* Parentheses are used for grouping, and the pipe symbol `|` is used
-  for alternatives.
-
-* The suffixes `?`, `*`, and `+` have the same meaning as in regular
-  expressions, although `[foo]` is used in place of `(foo)?`.
-
-* The syntax is defined in terms of bytes, not characters.  Terminals
-  `'c'` and `"c"` refer to the ASCII value of the given character `c`.
-  Standard C escape sequences are supported.
-
-* The prefix `~` means NOT.  It only applies to rules that match one
-  byte, and negates them.  For example, `~( 'a' | 'b' )` matches any
-  byte other than 'a' and 'b'.
-
-* Ranges of terminal values are expressed as `x...y` (inclusive).
-
-* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported.
-
-* There is no ambiguity, or look-ahead / backtracking beyond one byte.
-  Rules match left to right, depth-first, and greedy.  As soon as the
-  input matches the first terminal of a rule --explicit or implied by
-  recursively descending into the first non-terminal-- it must match
-  that rule to the end or a syntax error is reported.
-
-The last point makes the notation simple to translate to code.
-
-
-## Limitations outside the grammar
-
-The following limits are not represented in the grammar:
-
-* A `UnicodeSV` is the hexadecimal representation of a Unicode scalar
-  value; it must represent a value in the range 0 to D7FF, or E000 to
-  10FFFF, inclusive.  Any other value signals an error.  Valid values
-  are converted into a UTF-8 byte sequence encoding the value.
-
-* A `Rune` longer than 6 bytes is grammatical, but signals an error.
-  This is important because runes are not self-terminating; defining
-  their grammar as ending after a maximum of 6 bytes would allow
-  another datum beginning with an alphabetic character to follow a
-  rune immediately without any visual delineation, which would be
-  terribly confusing for a human reader.  Consider: `#foobarbaz`.
-  This would parse as a `Datum` joining `#foobar` and `baz`.
-
-  (The ABNF does not suffer from this issue, since it explicitly
-   enumerates the join possibilities anyway.)
-
-* A `Label` is the hexadecimal representation of a 48-bit integer,
-  meaning it allows for a maximum of 12 hexadecimal digits.  Longer
-  values are grammatical, but signal an out-of-range error, so as to
-  avoid signaling a confusing "invalid character" error on input that
-  appears grammatical.  Consider: `#%123456789abcd=foo`.  This would
-  signal an invalid character error at the letter `d` if the grammar
-  limited a `Label` to 12 hexadecimal digits.
-
-  (As above, the ABNF doesn't care about this.  You probably don't
-   want to use the ABNF to generate a parser anyway.)
-
-
-## At-quoted strings
-
-The mechanism of at-quoted strings is not represented in any of the
-grammars, since it essentially has 256 variants.  Representing it
-sanely in a grammar requires the ability to save and reference
-variables.
-
-
-## Stream-parsing strategy
-
-The parser consumes one `Unit` from the input stream every time it's
-called; it returns the `Datum` therein if found, or else it returns
-the Zisp EOF token.
-
-Since a `Datum` is not self-terminating, the parser must read beyond
-it to realize that it has ended (if not followed by the EOF).  Thus,
-it will consume one more `Blank` following the `Unit` that it parsed.
-If this `Blank` is a comment, it will be consumed entirely, ensuring
-that parsing resumes properly on a subsequent parser call on the same
-input stream, without needing to store any state in between.
-
-Since comments of type `SkipUnit` are likewise not self-terminating,
-an arbitrary number of chained `SkipUnit` comments may need to be
-consumed before the parser is finally allowed to return.
-
-The following illustration shows the positions at which the parser
-will stop consuming input when called repeatedly on the same input
-stream.  The dots represent the extent of each `Unit` being parsed,
-while the caret points at the last byte the parser will consume in
-that parse cycle.
-
-```
-foo (bar)[baz] foo;~bar foo;~bar;~baz;~bat foobar
-...^..........^...     ^...               ^......^
-```
-
-Notice how, in the fourth cycle, the parser is forced to consume all
-commented-out units before it can return, since it would otherwise
-leave the stream in an inappropriate state.
diff --git a/doc/c1/grammar/peg.txt b/doc/c1/grammar/peg.txt
deleted file mode 100644
index 7b28a99..0000000
--- a/doc/c1/grammar/peg.txt
+++ /dev/null
@@ -1,93 +0,0 @@
-# Standard PEG notation
-
-Stream       <- Unit ( Blank Unit )* !.
-
-
-Unit         <- Blank* Datum
-
-Blank        <- [\t-\r ] / Comment
-
-
-Datum        <- OneDatum ( JoinChar? OneDatum )*
-
-JoinChar     <- '.' / ':'
-
-
-Comment      <- ';' ( SkipUnit / SkipLine )
-
-SkipUnit     <- '~' Unit
-
-SkipLine     <- (!'\n' .)* '\n'?
-
-
-OneDatum     <- BareString / CladDatum
-
-
-BareString   <- SpecBareChar ( BareChar / JoinChar )*
-              / BareChar+
-
-SpecBareChar <- '+' / '-' / JoinChar / DIGIT
-
-BareChar     <- ALPHA / DIGIT
-              / '!' / '$' / '%' / '*' / '+' / '-' / '/'
-              / '<' / '=' / '>' / '?' / '^' / '_' / '~'
-
-
-CladDatum    <- PipeStr / QuoteStr / HashExpr / QuoteExpr / List
-
-PipeStr      <- '|' ( PipeStrChar / '\' StringEsc )* '|'
-QuoteStr     <- '"' ( QuotStrChar / '\' StringEsc )* '"'
-HashExpr     <- '#' HashExprs
-QuoteExpr    <- "'" Datum / '`' Datum / ',' Datum
-List         <- ParenList / SquareList / BraceList
-
-
-PipeStrChar  <- (![|\\] .)
-QuotStrChar  <- (!["\\] .)
-
-StringEsc    <- '\' / '|' / '"' / ( HTAB / SP )* LF ( HTAB / SP )*
-              / '0' / 'a' / 'b' / 't' / 'n' / 'v' / 'f' / 'r' / 'e'
-              / 'x' HexByte* ';'
-              / 'u' UnicodeSV ';'
-
-HexByte      <- HEXDIG HEXDIG
-UnicodeSV    <- HEXDIG+
-
-
-HashExprs    <- '!' [\t ]* HBangLine '\n'?
-              / '%' Label ( '%' / '=' Datum )
-              / '\' BareString / CladDatum
-              / Rune ( '\' BareString / CladDatum )?
-
-HBangLine    <- HBChars+ [\t ]* ( HBChars+ )?
-HBChars      <- (![\t\n ] .)
-Label        <- HEXDIG+
-Rune         <- ALPHA ( ALPHA / DIGIT )*
-
-
-ParenList    <- '(' ListBody ')'
-SquareList   <- '[' ListBody ']'
-BraceList    <- '{' ListBody '}'
-
-ListBody     <- Unit* ( Blank* '&' Unit )? Blank*
-
-
-DIGIT        <- [0-9]
-ALPHA        <- [a-zA-Z]
-HEXDIG       <- [0-9a-fA-F]
-
-
-# Keep this in sync line-for-line with the ZBNF grammar for easy
-# comparison between the two.
-
-# This file is meant to be compatible with:
-# https://piumarta.com/software/peg
-
-# Due to a quirk in the peg tool this file is used with, the grammar
-# must not allow an empty stream.  Therefore, the Unit rule has its
-# Datum declared as mandatory rather than optional.
-
-
-# Local Variables:
-# eval: (flyspell-mode -1)
-# End:
diff --git a/doc/c1/grammar/zbnf.txt b/doc/c1/grammar/zbnf.txt
deleted file mode 100644
index 923ac83..0000000
--- a/doc/c1/grammar/zbnf.txt
+++ /dev/null
@@ -1,77 +0,0 @@
-; Custom notation with PEG semantics
-
-Stream        : Unit ( Blank Unit )*
-
-
-Unit          : Blank* [Datum]
-
-Blank         : '\t'...'\r' | SP | Comment
-
-
-Datum         : OneDatum ( [JoinChar] OneDatum )*
-
-JoinChar      : '.' | ':'
-
-
-Comment       : ';' ( SkipUnit | SkipLine )
-
-SkipUnit      : '~' Unit
-
-SkipLine      : ( ~LF )* [LF]
-
-
-OneDatum      : BareString | CladDatum
-
-
-BareString    : SpecBareChar ( BareChar | JoinChar )*
-              | BareChar+
-
-SpecBareChar  : '+' | '-' | JoinChar | DIGIT
-
-BareChar      : ALPHA | DIGIT
-              | '!' | '$' | '%' | '*' | '+' | '-' | '/'
-              | '<' | '=' | '>' | '?' | '^' | '_' | '~'
-
-
-CladDatum     : PipeStr | QuoteStr | HashExpr | QuoteExpr | List
-
-PipeStr       : '|' ( PipeStrChar | '\' StringEsc )* '|'
-QuoteStr      : '"' ( QuotStrChar | '\' StringEsc )* '"'
-HashExpr      : '#' HashExprs
-QuoteExpr     : "'" Datum | '`' Datum | ',' Datum
-List          : ParenList | SquareList | BraceList
-
-
-PipeStrChar   : ~( '|' | '\' )
-QuotStrChar   : ~( '"' | '\' )
-
-StringEsc     : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )*
-              | '0' | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e'
-              | 'x' HexByte* ';'
-              | 'u' UnicodeSV ';'
-
-HexByte       : HEXDIG HEXDIG
-UnicodeSV     : HEXDIG+
-
-
-HashExprs     : '!' ( SP | HTAB )* HBangLine [ LF ]
-              | '%' Label ( '%' | '=' Datum )
-              | '\' BareString | CladDatum
-              | Rune [ '\' BareString | CladDatum ]
-
-HBangLine     : HBChars+ ( SP | HTAB )* [ HBChars+ ]
-HBChars       : ~( SP | HTAB | LF )
-Label         : HEXDIG+
-Rune          : ALPHA ( ALPHA | DIGIT )*
-
-
-ParenList     : '(' ListBody ')'
-SquareList    : '[' ListBody ']'
-BraceList     : '{' ListBody '}'
-
-ListBody      : Unit* [ Blank* '&' Unit ] Blank*
-
-
-;; Local Variables:
-;; eval: (flyspell-mode -1)
-;; End:
diff --git a/doc/c1/index.md b/doc/c1/index.md
deleted file mode 100644
index 6cec369..0000000
--- a/doc/c1/index.md
+++ /dev/null
@@ -1,30 +0,0 @@
-# Chapter 1: Genesis
-
-This chapter goes through the processes involved in reading source
-code, running it, and optionally compiling it.
-
-1. [Parse](1-parse.html) (see also [grammar](grammar/))
-
-   The parser receives a stream of bytes and transforms them into a
-   minimal set of data types with very little processing.
-
-2. [Decode](2-decode.html)
-
-   The decoder runs configurable and extensible pre-processing steps
-   over data received from the parser, enriching it with more complex
-   data types, and handling primitive source code transforms.  It's
-   comparable to the C pre-processor or Lisp's `DEFMACRO` mechanism,
-   with a few more responsibilities, such as number literal parsing.
-
-3. [Execute](3-execute.html)
-
-   Code is executed (or interpreted, or evaluated) in an environment,
-   also called a module, which may be mutated, and linked with other
-   modules.  Execution is immediate, without any pre-compilation.
-
-4. [Compile](4-compile.html)
-
-   Procedures from within the compiler module can be used to demand
-   the compilation of other modules, with various options, yielding
-   static or dynamic object files.  These may be loaded immediately,
-   replacing the previously uncompiled module code in memory.
author	Taylan Kammer <taylan.kammer@gmail.com>	2026-06-20 22:53:50 +0200
committer	Taylan Kammer <taylan.kammer@gmail.com>	2026-06-20 22:53:50 +0200
commit	b84ed4f563b3536365f7d3cc4d068407e98685b3 (patch)
tree	9ab7b18d712db1329b6230cb45520e7c85dc46fd /doc/c1
parent	bfaa74b19fc81dbe071d55566a78a8e329237eff (diff)