1 files changed, 122 insertions, 0 deletions
diff --git a/docs/parser.md b/docs/parser.md
new file mode 100644
index 0000000..a4e5d78
--- /dev/null
+++ b/docs/parser.md
@@ -0,0 +1,122 @@
+# Parser for Code & Data
+
+Zisp s-expressions are defined in terms of an extremely minimal set of data
+types; only that which is necessary to build representations of more complex
+expressions and data types:
+
+    +--------+-----------------+---------------+--------+----------+------+
+    | TYPE   | Bare String     | Quoted String | Rune   | Pair     | Nil  |
+    +--------+-----------------+---------------+--------+----------+------+
+    | E.G.   | foo, |foo bar|  | "foo bar"     | #name  | (X . Y)  | ()   |
+    +--------+-----------------+---------------+--------+----------+------+
+
+Bare strings and quoted strings are polymorphic sub-types of the generic
+string type.  Bare strings are implicitly interned.
+
+The parser can also output non-negative integers, but this is only used for
+datum labels; number literals are handled by the decoder (see next section).
+
+The parser recognizes various "syntax sugar" and transforms it into uses of
+the above data types.  The most ubiquitous example is of course the list:
+
+    (datum1 datum2 ...)  ->  (datum1 . (datum2 . (... . ())))
+
+The following table summarizes the other supported transformations:
+
+    #datum  -> (#HASH . datum)        #rune(...)   -> (#rune ...)
+
+    [...]   -> (#SQUARE ...)          dat1dat2     -> (#JOIN dat1 . dat2)
+
+    {...}   -> (#BRACE ...)           dat1.dat2    -> (#DOT dat1 . dat2)
+
+    'datum  -> (#QUOTE . datum)       dat1:dat2    -> (#COLON dat1 . dat2)
+
+    `datum  -> (#GRAVE . datum)       #%hex%       -> (#LABEL . hex)
+
+    ,datum  -> (#COMMA . datum)       #%hex=datum  -> (#LABEL hex . datum)
+
+A separate process called "decoding" can transform these objects into other
+data types.  For example, `(#HASH x y z)` could become a vector, so that the
+expression `#(x y z)` works just like in Scheme.  See the next section for
+details about the decoder.
+
+Decoding also resolves datum labels, and goes over bare strings to find ones
+that are actually a number literal.  This lets us offload the complexity of
+number parsing elsewhere, so the parser remains extremely simple.
+
+Further notes about the syntax sugar table and examples above:
+
+* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
+  means zero or more data; hex is a hexadecimal number of up to 12 digits.
+
+* The `#datum` form only applies when the datum following the hash sign is a
+  list, quoted string, quote expression, another expression starting with a
+  hash sign, a bare string starting with a backslash escape (see next), or a
+  pipe-quoted bare string (see next).
+
+* A backslash causes the immediately following character to lose any special
+  meaning it would have, and be considered as part of a bare string instead.
+  (This does not apply to space or control characters.)  For example, the
+  following three character sequences are each a valid bare string:
+
+      foo\(bar\)  \]blah  \#\'xyz
+
+  Bare strings can also be "quoted" with pipes as in Scheme; it should be
+  noted that this still produces a "bare string" in terms of data type:
+
+      |foo bar baz|
+
+* Though not represented in the table due to notational difficulty, the form
+  `#rune(...)` doesn't require a list in the second position; any datum that
+  works with the `#datum` syntax also works with `#rune<DATUM>`.
+
+      #rune1#rune2  -> (#rune1 . #rune2)
+
+      #rune"text"   -> (#rune . "text")
+
+      #rune\string  -> (rune . string)
+
+      #rune'string  -> (#rune #QUOTE . string)
+
+  As a counter-example, following a rune immediately with a bare string isn't
+  possible, since it's ambiguous:
+
+      #abcdefgh  ;Could be (#abcdef . gh) or (#abcde . fgh) or ...
+
+* Syntax sugar can combine arbitrarily; some examples follow:
+
+      #{...}            -> (#HASH #BRACE ...)
+
+      #'foo             -> (#HASH #QUOTE . foo)
+
+      ##'[...]          -> (#HASH #HASH #QUOTE #SQUARE ...)
+
+      {x y}[i j]        -> (#JOIN (#BRACE x y) #SQUARE i j)
+
+      foo.bar.baz{x y}  -> (#JOIN (#DOT (#DOT foo . bar) . baz) #BRACE x y)
+
+* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses
+  as `(#QUOTE . foo)` instead; the operand of `#QUOTE` is the entire cdr.
+
+  The same principle is used when parsing other sugar; some examples follow:
+
+      Incorrect                              Correct
+
+      #(x y z) -> (#HASH (x y z))            #(x y z) -> (#HASH x y z)
+
+      [x y z]  -> (#SQUARE (x y z))          [x y z]  -> (#SQUARE x y z)
+
+      #{x}     -> (#HASH (#BRACE (x)))       #{x}     -> (#HASH #BRACE x)
+
+      foo(x y) -> (#JOIN foo (x y))          foo(x y) -> (#JOIN foo x y)
+
+* Runes are case-sensitive, and the parser only emits runes using upper-case
+  letters when expressing syntax sugar.  This way, there can be no accidental
+  clash with runes that appear verbatim in code, as long as only lower-case
+  letters are used for rune literals in code.
+
+<!--
+;; Local Variables:
+;; fill-column: 77
+;; End:
+-->