More doc and style improvements.

author: Taylan Kammer <taylan.kammer@gmail.com> 2026-06-02 23:56:10 +0200
committer: Taylan Kammer <taylan.kammer@gmail.com> 2026-06-02 23:56:10 +0200
commit: dca76cd7955573cc537933c7beb93d2d9ee2b1d2 (patch)
tree: 1f082c2f2d6036019b28a72d146709fbcc32cc0c /doc
parent: af6f48ff079fc8067b564adeaa73caed8cbf5438 (diff)
1 files changed, 25 insertions, 28 deletions
diff --git a/doc/c1/1-parse.md b/doc/c1/1-parse.md
index 8932481..e396ca5 100644
--- a/doc/c1/1-parse.md
+++ b/doc/c1/1-parse.md
@@ -1,20 +1,20 @@
-# Parser for Code and Data
+# Parser for Code & Data
 
 Zisp s-expressions represent an extremely minimal set of data types; only that
 which is necessary to strategically construct more complex values:
 
-    +-------+---------+--------+----------+
-    | TYPE  | String  | Rune   | Pair     |
-    +-------+---------+--------+----------+
-    | E.G.  | foobar  | #name  | (X & Y)  |
-    +-------+---------+--------+----------+
+    +-------+---------+--------+----------+------+
+    | TYPE  | String  | Rune   | Pair     | Nil  |
+    +-------+---------+--------+----------+------+
+    | E.G.  | foobar  | #name  | (X & Y)  | ()   |
+    +-------+---------+--------+----------+------+
 
 The parser also recognizes various *syntax sugar* which typically results in a
 pair beginning with a specific rune.  A separate component called the *decoder*
-transforms such data into a rich set of value types.  See below for details.
+transforms such data into a rich set of value types.
 
 
-## Charset and Stream Handling
+## Character Encoding
 
 The parser does not consume Unicode characters; it consumes bytes.  Grammar is
 generally constructed by bytes corresponding to ASCII characters.
@@ -41,7 +41,8 @@ All that needs to be done for this to work, is that any incidental occurrences
 of the double-quote sign, and the backslash sign, are escaped with a backslash
 within the `<BINARY>` data; all other bytes can appear verbatim in the strings.
 
-### Buffering
+
+## Stream Parsing
 
 The parser can be repeatedly invoked on a byte stream to consume the next datum
 within.  This does not require "unreading" or back-seeking within the stream;
@@ -65,7 +66,7 @@ The "header" for each file in this stream is a Zisp s-expression containing
 information about how many bytes should be read after the header, before the
 next file header appears.  (The header data need to be terminated with a blank
 ASCII character such as a newline; the closing parenthesis does not act as a
-terminator unto itself due to the "join" syntax sugar; see later.)
+terminator unto itself due to the "join" syntax sugar.)
 
 To enable this stream parsing strategy, the parser does not use any automatic
 buffering.  If it did, it might inadvertently consume some bytes beyond the
@@ -134,16 +135,12 @@ which contains bytes that could not appear in a *bare* string:
     "foo bar"  ->  (#DQUOTE & <STRING>)
 
 In this example, the visual token `<STRING>` represents the actual string value
-in program memory.  It may seem contrived to refer to this as syntax sugar, but
-we are using the term uniformly for any situation in which the parser generates
-a pair with a rune in its first position, intended for the decoder to handle.
-
-Those familiar with Lisp and Scheme may expect *bare* strings to be parsed into
-a separate data type called a *symbol* but this does not exist in Zisp.  Quoted
-strings instead parse into this internal representation to differentiate them
-from bare strings which may represent identifiers in code.
+in program memory, which has no direct external representation in bytes because
+it contains a space character.
 
-Other syntax sugar is explained further below.
+Those familiar with Lisp and Scheme may expect bare strings to be parsed into a
+separate type called *symbol* while quoted strings are parsed directly into a
+string type, but this is not the case in Zisp.
 
 ### Decoder
 
@@ -163,7 +160,7 @@ See the dedicated documentation of the [decoder](2-decode.html) for more.
 
 ## Data types
 
-Following is a more explanation of the four core data types constructed by the
+Following is a more in-depth explanation of each data type constructed by the
 Zisp s-expression parser.
 
 These are in fact value types, though the term "data type" is often used due to
@@ -173,8 +170,8 @@ is only a *datum* if it adheres to additional constraints as explained below.
 ### String
 
 Strings can appear *bare* or be quoted in various ways.  A quoted string is in
-fact parsed into a pair value (see below) with a rune in the first position to
-identify the quotation category, and the string value in the second position.
+fact parsed into a pair value with a rune in the first position to identify the
+quotation variant that was parsed, and the string value in the second position.
 
     +-----------+----------------------+
     | Syntax    | Parse output         |
@@ -223,7 +220,7 @@ letters when expressing syntax sugar.  Uppercase rune names are reserved for
 Zisp's internal use and standard library; users can use lowercase runes with
 custom meaning without worrying about clashes, with the exception of a small
 number of lowercase runes such as `#true` and `#false` that are part of the
-default decoder settings.
+default decoder settings and documented explicitly as such.
 
 Runes are always stored directly in a CPU word; never by memory address.
 
@@ -237,7 +234,7 @@ and represents that pair through the memory address of the cell.
 
 Pairs are valid data if one of the following holds true:
 
-* The pair encodes a quoted string, datum label, or shebang line. (See below.)
+* The pair encodes a quoted string, datum label, or shebang line.
 
 * Both the first and second value in the pair is a valid datum.
 
@@ -320,7 +317,7 @@ valid ASCII byte; it can be absolutely any byte value, even NUL.  This can be
 useful to easily encode binary data which is known to not contain a specific
 byte; an example would be C strings which cannot contain NUL.
 
-### Backslash escape sequences
+### Backslash escapes
 
 In pipe-quoted and double-quoted strings, the following ASCII characters may
 follow a backslash to insert a certain character.
@@ -380,8 +377,8 @@ Explanations:
   again any number of blanks, is substituted with nothing.  This is to allow
   splitting a string into multiple lines for human readability.
 
-      (define paragraph "This paragraph has been visually split into multiple \
-                         lines, but the newline is escaped, so it's one line.")
+      (define p "This paragraph has been visually split into multiple \
+                 lines, but the newline is escaped, so it's one line.")
 
 * An x, followed by pairs of hexadecimal digits (case insensitive), terminated
   by a semicolon, is substituted with the sequence of bytes represented by the
@@ -472,7 +469,7 @@ This is indeed quite an eldritch syntax, but hopefully most programs would not
 need to use it.
 
 
-## Syntax sugar
+## Other syntax
 
 The following table summarizes commonly useful syntax abbreviations:
author	Taylan Kammer <taylan.kammer@gmail.com>	2026-06-02 23:56:10 +0200
committer	Taylan Kammer <taylan.kammer@gmail.com>	2026-06-02 23:56:10 +0200
commit	dca76cd7955573cc537933c7beb93d2d9ee2b1d2 (patch)
tree	1f082c2f2d6036019b28a72d146709fbcc32cc0c /doc
parent	af6f48ff079fc8067b564adeaa73caed8cbf5438 (diff)