docs/c1/1-parse.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

# Parser for Code & Data

*For an exact specification of the grammar, see [grammar](grammar.html).*

Zisp S-Expressions represent an extremely minimal set of data types; only that
which is necessary to strategically construct more complex code and data:

    +--------+-----------------+--------+----------+------+
    | TYPE   | String          | Rune   | Pair     | Nil  |
    +--------+-----------------+--------+----------+------+
    | E.G.   | foo, |foo bar|  | #name  | (X & Y)  | ()   |
    +--------+-----------------+--------+----------+------+

The parser can also output non-negative integers, but this is only used for
datum labels; number literals are handled by the *decoder* (see next).

The parser recognizes various "syntax sugar" and transforms it into uses of the
above data types.  The most ubiquitous example is of course the list:

    (datum1 datum2 ...)  ->  (datum1 & (datum2 & (... & ())))

The following table summarizes the other supported transformations:

    "xyz"   -> (#QUOTE & |xyz|)       #datum       -> (#HASH & datum)

    [...]   -> (#SQUARE ...)          #rune(...)   -> (#rune ...)

    {...}   -> (#BRACE ...)           dat1dat2     -> (#JOIN dat1 & dat2)

    'datum  -> (#QUOTE & datum)       dat1.dat2    -> (#DOT dat1 & dat2)

    `datum  -> (#GRAVE & datum)       dat1:dat2    -> (#COLON dat1 & dat2)

    ,datum  -> (#COMMA & datum)       #%hex%       -> (#LABEL & hex)

                                      #%hex=datum  -> (#LABEL hex & datum)

A separate process called *decoding* can transform such data into more complex
types.  For example, `(#HASH x y z)` could be decoded into a vector, so the
expression `#(x y z)` works just like in Scheme.

Decoding also resolves datum labels, goes over strings to find ones that are
actually a number literal, and takes care of a number of other transformations.
This offloads complexity, allowing the parser to remain extremely simple.  See
the dedicated documentation of the decoder for more.

Further notes about the syntax sugar table and examples above:

* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
  means zero or more data; hex is a hexadecimal number of up to 12 digits.

* The `#datum` form only applies when the datum following the hash sign is a
  list, quoted string, quote expression, another expression starting with the
  hash sign, or a pipe-quoted string (see next).  A bare string can follow the
  hash sign by separating the two with a backslash: `#\string`

* Strings can be quoted with pipes, like symbols in Scheme.  This is the "real"
  string literal syntax, whereas using double quotes is syntax sugar for a
  quoted string literal.

      |foo bar baz|  -> |foo bar baz|

      "foo bar baz"  -> (#QUOTE & |foo bar baz|)

* Though not represented in the table due to notational difficulty, the form
  `#rune(...)` doesn't require a list in the second position; any datum that
  works with the `#datum` syntax also works with `#rune<DATUM>`.

      #rune1#rune2  -> (#rune1 & #rune2)

      #rune"text"   -> (#rune & "text")

      #rune\string  -> (rune & string)

      #rune'string  -> (#rune #QUOTE & string)

  As a counter-example, following a rune immediately with a bare string isn't
  possible without the delimiting backslash, since that would be ambiguous:

      #abcdefgh  ;Could be (#abcdef & gh) or (#abcde & fgh) or ...

* Syntax sugar can combine arbitrarily.  Some examples follow.  Any of these may
  or may not actually have a meaning in code; many could simply end up producing
  a syntax error at the macro-expand stage.

      #{...}            -> (#HASH #BRACE ...)

      #'foo             -> (#HASH #QUOTE & foo)

      ##'[...]          -> (#HASH #HASH #QUOTE #SQUARE ...)

      {x y}[i j]        -> (#JOIN (#BRACE x y) #SQUARE i j)

      foo.bar.baz{x y}  -> (#JOIN (#DOT (#DOT foo & bar) & baz) #BRACE x y)

* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses
  as `(#QUOTE & foo)` instead; the operand of `#QUOTE` is the entire cdr.

  The same principle is used when parsing other sugar; some examples follow:

      Incorrect                              Correct

      #(x y z) -> (#HASH (x y z))            #(x y z) -> (#HASH x y z)

      [x y z]  -> (#SQUARE (x y z))          [x y z]  -> (#SQUARE x y z)

      #{x}     -> (#HASH (#BRACE (x)))       #{x}     -> (#HASH #BRACE x)

      foo(x y) -> (#JOIN foo (x y))          foo(x y) -> (#JOIN foo x y)

* Runes are case-sensitive, and the parser always emits runes using upper-case
  letters when expressing syntax sugar.  Uppercase rune names are reserved for
  Zisp's internal use and standard library; users can use lowercase runes with
  custom meaning without worrying about clashes.

<!--
;; Local Variables:
;; fill-column: 80
;; End:
-->