docs/parser.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122

# Parser for Code & Data

Zisp s-expressions are defined in terms of an extremely minimal set of data
types; only that which is necessary to build representations of more complex
expressions and data types:

    +--------+-----------------+---------------+--------+----------+------+
    | TYPE   | Bare String     | Quoted String | Rune   | Pair     | Nil  |
    +--------+-----------------+---------------+--------+----------+------+
    | E.G.   | foo, |foo bar|  | "foo bar"     | #name  | (X . Y)  | ()   |
    +--------+-----------------+---------------+--------+----------+------+

Bare strings and quoted strings are polymorphic sub-types of the generic
string type.  Bare strings are implicitly interned.

The parser can also output non-negative integers, but this is only used for
datum labels; number literals are handled by the decoder (see next section).

The parser recognizes various "syntax sugar" and transforms it into uses of
the above data types.  The most ubiquitous example is of course the list:

    (datum1 datum2 ...)  ->  (datum1 . (datum2 . (... . ())))

The following table summarizes the other supported transformations:

    #datum  -> (#HASH . datum)        #rune(...)   -> (#rune ...)

    [...]   -> (#SQUARE ...)          dat1dat2     -> (#JOIN dat1 . dat2)

    {...}   -> (#BRACE ...)           dat1.dat2    -> (#DOT dat1 . dat2)

    'datum  -> (#QUOTE . datum)       dat1:dat2    -> (#COLON dat1 . dat2)

    `datum  -> (#GRAVE . datum)       #%hex%       -> (#LABEL . hex)

    ,datum  -> (#COMMA . datum)       #%hex=datum  -> (#LABEL hex . datum)

A separate process called "decoding" can transform these objects into other
data types.  For example, `(#HASH x y z)` could become a vector, so that the
expression `#(x y z)` works just like in Scheme.  See the next section for
details about the decoder.

Decoding also resolves datum labels, and goes over bare strings to find ones
that are actually a number literal.  This lets us offload the complexity of
number parsing elsewhere, so the parser remains extremely simple.

Further notes about the syntax sugar table and examples above:

* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
  means zero or more data; hex is a hexadecimal number of up to 12 digits.

* The `#datum` form only applies when the datum following the hash sign is a
  list, quoted string, quote expression, another expression starting with a
  hash sign, a bare string starting with a backslash escape (see next), or a
  pipe-quoted bare string (see next).

* A backslash causes the immediately following character to lose any special
  meaning it would have, and be considered as part of a bare string instead.
  (This does not apply to space or control characters.)  For example, the
  following three character sequences are each a valid bare string:

      foo\(bar\)  \]blah  \#\'xyz

  Bare strings can also be "quoted" with pipes as in Scheme; it should be
  noted that this still produces a "bare string" in terms of data type:

      |foo bar baz|

* Though not represented in the table due to notational difficulty, the form
  `#rune(...)` doesn't require a list in the second position; any datum that
  works with the `#datum` syntax also works with `#rune<DATUM>`.

      #rune1#rune2  -> (#rune1 . #rune2)

      #rune"text"   -> (#rune . "text")

      #rune\string  -> (rune . string)

      #rune'string  -> (#rune #QUOTE . string)

  As a counter-example, following a rune immediately with a bare string isn't
  possible, since it's ambiguous:

      #abcdefgh  ;Could be (#abcdef . gh) or (#abcde . fgh) or ...

* Syntax sugar can combine arbitrarily; some examples follow:

      #{...}            -> (#HASH #BRACE ...)

      #'foo             -> (#HASH #QUOTE . foo)

      ##'[...]          -> (#HASH #HASH #QUOTE #SQUARE ...)

      {x y}[i j]        -> (#JOIN (#BRACE x y) #SQUARE i j)

      foo.bar.baz{x y}  -> (#JOIN (#DOT (#DOT foo . bar) . baz) #BRACE x y)

* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses
  as `(#QUOTE . foo)` instead; the operand of `#QUOTE` is the entire cdr.

  The same principle is used when parsing other sugar; some examples follow:

      Incorrect                              Correct

      #(x y z) -> (#HASH (x y z))            #(x y z) -> (#HASH x y z)

      [x y z]  -> (#SQUARE (x y z))          [x y z]  -> (#SQUARE x y z)

      #{x}     -> (#HASH (#BRACE (x)))       #{x}     -> (#HASH #BRACE x)

      foo(x y) -> (#JOIN foo (x y))          foo(x y) -> (#JOIN foo x y)

* Runes are case-sensitive, and the parser only emits runes using upper-case
  letters when expressing syntax sugar.  This way, there can be no accidental
  clash with runes that appear verbatim in code, as long as only lower-case
  letters are used for rune literals in code.

<!--
;; Local Variables:
;; fill-column: 77
;; End:
-->