diff options
Diffstat (limited to 'docs/parser.md')
| -rw-r--r-- | docs/parser.md | 122 |
1 files changed, 122 insertions, 0 deletions
diff --git a/docs/parser.md b/docs/parser.md new file mode 100644 index 0000000..a4e5d78 --- /dev/null +++ b/docs/parser.md @@ -0,0 +1,122 @@ +# Parser for Code & Data + +Zisp s-expressions are defined in terms of an extremely minimal set of data +types; only that which is necessary to build representations of more complex +expressions and data types: + + +--------+-----------------+---------------+--------+----------+------+ + | TYPE | Bare String | Quoted String | Rune | Pair | Nil | + +--------+-----------------+---------------+--------+----------+------+ + | E.G. | foo, |foo bar| | "foo bar" | #name | (X . Y) | () | + +--------+-----------------+---------------+--------+----------+------+ + +Bare strings and quoted strings are polymorphic sub-types of the generic +string type. Bare strings are implicitly interned. + +The parser can also output non-negative integers, but this is only used for +datum labels; number literals are handled by the decoder (see next section). + +The parser recognizes various "syntax sugar" and transforms it into uses of +the above data types. The most ubiquitous example is of course the list: + + (datum1 datum2 ...) -> (datum1 . (datum2 . (... . ()))) + +The following table summarizes the other supported transformations: + + #datum -> (#HASH . datum) #rune(...) -> (#rune ...) + + [...] -> (#SQUARE ...) dat1dat2 -> (#JOIN dat1 . dat2) + + {...} -> (#BRACE ...) dat1.dat2 -> (#DOT dat1 . dat2) + + 'datum -> (#QUOTE . datum) dat1:dat2 -> (#COLON dat1 . dat2) + + `datum -> (#GRAVE . datum) #%hex% -> (#LABEL . hex) + + ,datum -> (#COMMA . datum) #%hex=datum -> (#LABEL hex . datum) + +A separate process called "decoding" can transform these objects into other +data types. For example, `(#HASH x y z)` could become a vector, so that the +expression `#(x y z)` works just like in Scheme. See the next section for +details about the decoder. + +Decoding also resolves datum labels, and goes over bare strings to find ones +that are actually a number literal. This lets us offload the complexity of +number parsing elsewhere, so the parser remains extremely simple. + +Further notes about the syntax sugar table and examples above: + +* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis + means zero or more data; hex is a hexadecimal number of up to 12 digits. + +* The `#datum` form only applies when the datum following the hash sign is a + list, quoted string, quote expression, another expression starting with a + hash sign, a bare string starting with a backslash escape (see next), or a + pipe-quoted bare string (see next). + +* A backslash causes the immediately following character to lose any special + meaning it would have, and be considered as part of a bare string instead. + (This does not apply to space or control characters.) For example, the + following three character sequences are each a valid bare string: + + foo\(bar\) \]blah \#\'xyz + + Bare strings can also be "quoted" with pipes as in Scheme; it should be + noted that this still produces a "bare string" in terms of data type: + + |foo bar baz| + +* Though not represented in the table due to notational difficulty, the form + `#rune(...)` doesn't require a list in the second position; any datum that + works with the `#datum` syntax also works with `#rune<DATUM>`. + + #rune1#rune2 -> (#rune1 . #rune2) + + #rune"text" -> (#rune . "text") + + #rune\string -> (rune . string) + + #rune'string -> (#rune #QUOTE . string) + + As a counter-example, following a rune immediately with a bare string isn't + possible, since it's ambiguous: + + #abcdefgh ;Could be (#abcdef . gh) or (#abcde . fgh) or ... + +* Syntax sugar can combine arbitrarily; some examples follow: + + #{...} -> (#HASH #BRACE ...) + + #'foo -> (#HASH #QUOTE . foo) + + ##'[...] -> (#HASH #HASH #QUOTE #SQUARE ...) + + {x y}[i j] -> (#JOIN (#BRACE x y) #SQUARE i j) + + foo.bar.baz{x y} -> (#JOIN (#DOT (#DOT foo . bar) . baz) #BRACE x y) + +* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses + as `(#QUOTE . foo)` instead; the operand of `#QUOTE` is the entire cdr. + + The same principle is used when parsing other sugar; some examples follow: + + Incorrect Correct + + #(x y z) -> (#HASH (x y z)) #(x y z) -> (#HASH x y z) + + [x y z] -> (#SQUARE (x y z)) [x y z] -> (#SQUARE x y z) + + #{x} -> (#HASH (#BRACE (x))) #{x} -> (#HASH #BRACE x) + + foo(x y) -> (#JOIN foo (x y)) foo(x y) -> (#JOIN foo x y) + +* Runes are case-sensitive, and the parser only emits runes using upper-case + letters when expressing syntax sugar. This way, there can be no accidental + clash with runes that appear verbatim in code, as long as only lower-case + letters are used for rune literals in code. + +<!-- +;; Local Variables: +;; fill-column: 77 +;; End: +--> |
