summaryrefslogtreecommitdiff
path: root/docs/c1/1-parse.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/c1/1-parse.md')
-rw-r--r--docs/c1/1-parse.md117
1 files changed, 117 insertions, 0 deletions
diff --git a/docs/c1/1-parse.md b/docs/c1/1-parse.md
new file mode 100644
index 0000000..a23ebbc
--- /dev/null
+++ b/docs/c1/1-parse.md
@@ -0,0 +1,117 @@
+# Parser for Code & Data
+
+Zisp s-expressions are defined in terms of an extremely minimal set of data
+types; only that which is necessary to build representations of more complex
+expressions and data types:
+
+ +--------+-----------------+--------+----------+------+
+ | TYPE | String | Rune | Pair | Nil |
+ +--------+-----------------+--------+----------+------+
+ | E.G. | foo, |foo bar| | #name | (X & Y) | () |
+ +--------+-----------------+--------+----------+------+
+
+Note that the ampersand replaces the period in pair notation. This simplifies
+the grammar: periods are a regular constituent of strings, while the ampersand
+cannot appear in unquoted strings.
+
+The parser can also output non-negative integers, but this is only used for
+datum labels; number literals are handled by the *decoder*.
+
+The parser recognizes various "syntax sugar" and transforms it into uses of the
+above data types. The most ubiquitous example is of course the list:
+
+ (datum1 datum2 ...) -> (datum1 & (datum2 & (... & ())))
+
+The following table summarizes the other supported transformations:
+
+ "xyz" -> (#QUOTE & |xyz|) #datum -> (#HASH & datum)
+
+ [...] -> (#SQUARE ...) #rune(...) -> (#rune ...)
+
+ {...} -> (#BRACE ...) dat1dat2 -> (#JOIN dat1 & dat2)
+
+ 'datum -> (#QUOTE & datum) dat1.dat2 -> (#DOT dat1 & dat2)
+
+ `datum -> (#GRAVE & datum) dat1:dat2 -> (#COLON dat1 & dat2)
+
+ ,datum -> (#COMMA & datum) #%hex% -> (#LABEL & hex)
+
+ #%hex=datum -> (#LABEL hex & datum)
+
+A separate process called *decoding* can transform such data into more complex
+types. For example, `(#HASH x y z)` could be decoded into a vector, so the
+expression `#(x y z)` works just like in Scheme.
+
+Decoding also resolves datum labels, goes over strings to find ones that are
+actually a number literal, and takes care of a number of other transformations.
+This offloads complexity, allowing the parser to remain extremely simple. See
+the dedicated documentation of the decoder for more.
+
+Further notes about the syntax sugar table and examples above:
+
+* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
+ means zero or more data; hex is a hexadecimal number of up to 12 digits.
+
+* The `#datum` form only applies when the datum following the hash sign is a
+ list, quoted string, quote expression, another expression starting with the
+ hash sign, or a pipe-quoted string (see next). A bare string can follow the
+ hash sign by separating the two with a backslash: `#\string`
+
+* Strings can be quoted with pipes, like symbols in Scheme:
+
+ |foo bar baz|
+
+* Though not represented in the table due to notational difficulty, the form
+ `#rune(...)` doesn't require a list in the second position; any datum that
+ works with the `#datum` syntax also works with `#rune<DATUM>`.
+
+ #rune1#rune2 -> (#rune1 & #rune2)
+
+ #rune"text" -> (#rune & "text")
+
+ #rune\string -> (rune & string)
+
+ #rune'string -> (#rune #QUOTE & string)
+
+ As a counter-example, following a rune immediately with a bare string isn't
+ possible without the delimiting backslash, since that would be ambiguous:
+
+ #abcdefgh ;Could be (#abcdef & gh) or (#abcde & fgh) or ...
+
+* Syntax sugar can combine arbitrarily; some examples follow:
+
+ #{...} -> (#HASH #BRACE ...)
+
+ #'foo -> (#HASH #QUOTE & foo)
+
+ ##'[...] -> (#HASH #HASH #QUOTE #SQUARE ...)
+
+ {x y}[i j] -> (#JOIN (#BRACE x y) #SQUARE i j)
+
+ foo.bar.baz{x y} -> (#JOIN (#DOT (#DOT foo & bar) & baz) #BRACE x y)
+
+* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses
+ as `(#QUOTE & foo)` instead; the operand of `#QUOTE` is the entire cdr.
+
+ The same principle is used when parsing other sugar; some examples follow:
+
+ Incorrect Correct
+
+ #(x y z) -> (#HASH (x y z)) #(x y z) -> (#HASH x y z)
+
+ [x y z] -> (#SQUARE (x y z)) [x y z] -> (#SQUARE x y z)
+
+ #{x} -> (#HASH (#BRACE (x))) #{x} -> (#HASH #BRACE x)
+
+ foo(x y) -> (#JOIN foo (x y)) foo(x y) -> (#JOIN foo x y)
+
+* Runes are case-sensitive, and the parser always emits runes using upper-case
+ letters when expressing syntax sugar. Uppercase rune names are reserved for
+ Zisp's internal use and standard library; users can use lowercase runes with
+ custom meaning without worrying about clashes.
+
+<!--
+;; Local Variables:
+;; fill-column: 80
+;; End:
+-->