Draft a Manual front-page.

author: Taylan Kammer <taylan.kammer@gmail.com> 2026-01-08 14:54:20 +0100
committer: Taylan Kammer <taylan.kammer@gmail.com> 2026-01-08 14:54:20 +0100
commit: 8012e3fe177069a709f30d2ab4a18ff11025c86f (patch)
tree: a6e75e6a3a56ff3fc6f056f6347e58329915725b /docs/c1
parent: cf2697d24c13cdc7ea5f93ce0ff5143f41a85a83 (diff)
2 files changed, 161 insertions, 0 deletions
diff --git a/docs/c1/1-parse.md b/docs/c1/1-parse.md
new file mode 100644
index 0000000..a23ebbc
--- /dev/null
+++ b/docs/c1/1-parse.md
@@ -0,0 +1,117 @@
+# Parser for Code & Data
+
+Zisp s-expressions are defined in terms of an extremely minimal set of data
+types; only that which is necessary to build representations of more complex
+expressions and data types:
+
+    +--------+-----------------+--------+----------+------+
+    | TYPE   | String          | Rune   | Pair     | Nil  |
+    +--------+-----------------+--------+----------+------+
+    | E.G.   | foo, |foo bar|  | #name  | (X & Y)  | ()   |
+    +--------+-----------------+--------+----------+------+
+
+Note that the ampersand replaces the period in pair notation.  This simplifies
+the grammar: periods are a regular constituent of strings, while the ampersand
+cannot appear in unquoted strings.
+
+The parser can also output non-negative integers, but this is only used for
+datum labels; number literals are handled by the *decoder*.
+
+The parser recognizes various "syntax sugar" and transforms it into uses of the
+above data types.  The most ubiquitous example is of course the list:
+
+    (datum1 datum2 ...)  ->  (datum1 & (datum2 & (... & ())))
+
+The following table summarizes the other supported transformations:
+
+    "xyz"   -> (#QUOTE & |xyz|)       #datum       -> (#HASH & datum)
+
+    [...]   -> (#SQUARE ...)          #rune(...)   -> (#rune ...)
+
+    {...}   -> (#BRACE ...)           dat1dat2     -> (#JOIN dat1 & dat2)
+
+    'datum  -> (#QUOTE & datum)       dat1.dat2    -> (#DOT dat1 & dat2)
+
+    `datum  -> (#GRAVE & datum)       dat1:dat2    -> (#COLON dat1 & dat2)
+
+    ,datum  -> (#COMMA & datum)       #%hex%       -> (#LABEL & hex)
+
+                                      #%hex=datum  -> (#LABEL hex & datum)
+
+A separate process called *decoding* can transform such data into more complex
+types.  For example, `(#HASH x y z)` could be decoded into a vector, so the
+expression `#(x y z)` works just like in Scheme.
+
+Decoding also resolves datum labels, goes over strings to find ones that are
+actually a number literal, and takes care of a number of other transformations.
+This offloads complexity, allowing the parser to remain extremely simple.  See
+the dedicated documentation of the decoder for more.
+
+Further notes about the syntax sugar table and examples above:
+
+* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
+  means zero or more data; hex is a hexadecimal number of up to 12 digits.
+
+* The `#datum` form only applies when the datum following the hash sign is a
+  list, quoted string, quote expression, another expression starting with the
+  hash sign, or a pipe-quoted string (see next).  A bare string can follow the
+  hash sign by separating the two with a backslash: `#\string`
+
+* Strings can be quoted with pipes, like symbols in Scheme:
+
+      |foo bar baz|
+
+* Though not represented in the table due to notational difficulty, the form
+  `#rune(...)` doesn't require a list in the second position; any datum that
+  works with the `#datum` syntax also works with `#rune<DATUM>`.
+
+      #rune1#rune2  -> (#rune1 & #rune2)
+
+      #rune"text"   -> (#rune & "text")
+
+      #rune\string  -> (rune & string)
+
+      #rune'string  -> (#rune #QUOTE & string)
+
+  As a counter-example, following a rune immediately with a bare string isn't
+  possible without the delimiting backslash, since that would be ambiguous:
+
+      #abcdefgh  ;Could be (#abcdef & gh) or (#abcde & fgh) or ...
+
+* Syntax sugar can combine arbitrarily; some examples follow:
+
+      #{...}            -> (#HASH #BRACE ...)
+
+      #'foo             -> (#HASH #QUOTE & foo)
+
+      ##'[...]          -> (#HASH #HASH #QUOTE #SQUARE ...)
+
+      {x y}[i j]        -> (#JOIN (#BRACE x y) #SQUARE i j)
+
+      foo.bar.baz{x y}  -> (#JOIN (#DOT (#DOT foo & bar) & baz) #BRACE x y)
+
+* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses
+  as `(#QUOTE & foo)` instead; the operand of `#QUOTE` is the entire cdr.
+
+  The same principle is used when parsing other sugar; some examples follow:
+
+      Incorrect                              Correct
+
+      #(x y z) -> (#HASH (x y z))            #(x y z) -> (#HASH x y z)
+
+      [x y z]  -> (#SQUARE (x y z))          [x y z]  -> (#SQUARE x y z)
+
+      #{x}     -> (#HASH (#BRACE (x)))       #{x}     -> (#HASH #BRACE x)
+
+      foo(x y) -> (#JOIN foo (x y))          foo(x y) -> (#JOIN foo x y)
+
+* Runes are case-sensitive, and the parser always emits runes using upper-case
+  letters when expressing syntax sugar.  Uppercase rune names are reserved for
+  Zisp's internal use and standard library; users can use lowercase runes with
+  custom meaning without worrying about clashes.
+
+<!--
+;; Local Variables:
+;; fill-column: 80
+;; End:
+-->
diff --git a/docs/c1/2-decode.md b/docs/c1/2-decode.md
new file mode 100644
index 0000000..0b34204
--- /dev/null
+++ b/docs/c1/2-decode.md
@@ -0,0 +1,44 @@
+# Decoding
+
+A separate process called "decoding" can transform simple data structures,
+consisting of only the datum types, into a richer set of Zisp types.
+
+For example, the decoder may turn `(#HASH ...)` into a vector, as one would
+expect a vector literal like `#(...)` to work in Scheme.  Bytevector syntax
+could use a custom rune as a list prefix, like: `#u8(...)`
+
+Runes may be decoded in isolation as well, rather than transforming a list
+whose head they appear in.  This can implement Boolean constants as `#true`
+and `#false` or `#t` and `#f`.
+
+The decoder recognizes `(#QUOTE ...)` to aid in implementing the traditional
+quoting mechanism of Lisp/Scheme, but with a significant difference:
+
+Traditional quote is "unhygienic" in Scheme terms.  An expression such as
+`'(foo bar)` will always be read as `(quote (foo bar))` regardless of what
+lexical context it appears in, so the semantics will depend on whatever the
+identifier `quote` is bound to, meaning that the expression may end up
+evaluating to something other than the list `(foo bar)`.
+
+The Zisp decoder, which transforms not datum to datum, but object to object,
+can turn `#QUOTE` into an object which encapsulates the notion of quoting,
+which the Zisp evaluator can recognize and act upon, ensuring that an
+expression like `'(foo bar)` always turns into the list `(foo bar)`.
+
+One way to think about this, in Scheme (R6RS / syntax-case) terms, is that
+expressions like `'(foo bar)` turn directly into a syntax object when read,
+and the created syntax object begins with an identifier bound to `quote` in
+the standard library.
+
+The decoder is, of course, configurable and extensible.  The transformations
+mentioned above would be performed only when it's told to decode data which
+represents Zisp code.  The decoder may be given a different configuration,
+telling it to decode, for example, data which represents a different kind of
+domain-specific data, such as application settings, build system commands,
+complex data records with non-standard data types, and so on.
+
+<!--
+;; Local Variables:
+;; fill-column: 77
+;; End:
+-->
author	Taylan Kammer <taylan.kammer@gmail.com>	2026-01-08 14:54:20 +0100
committer	Taylan Kammer <taylan.kammer@gmail.com>	2026-01-08 14:54:20 +0100
commit	8012e3fe177069a709f30d2ab4a18ff11025c86f (patch)
tree	a6e75e6a3a56ff3fc6f056f6347e58329915725b /docs/c1
parent	cf2697d24c13cdc7ea5f93ce0ff5143f41a85a83 (diff)