1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
|
# Zisp S-Expression Syntax
We use a BNF notation with the following rules:
* Concatenation of expressions is implicit: `foo bar` means `foo`
followed by `bar`.
* Expressions may be followed by `?`, `*`, `+`, `{N}`, or `{N,M}`,
which have meanings analogous to regular expressions.
* The syntax `[foo]` is shorthand for `(foo)?`.
* The syntax is defined in terms of bytes, not characters. Terminals
`'c'` and `"c"` refer to the ASCII value of the given character `c`.
Numbers are in decimal and refer to a byte with the given value.
* The `~` prefix means NOT. It only applies to rules that match one
byte, and negates them. For example, `~( 'a' | 'b' )` matches any
byte other than 97 and 98.
* Ranges of terminal values are expressed as `x...y` (inclusive).
* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported, with the
addition of EOF to explicitly demarcate the end of the byte stream.
* There is no ambiguity, backtracking, or look-ahead beyond one byte.
Rules match left to right, depth-first, and greedy. As soon as the
input matches the first terminal of a rule, it must match that rule
to the end or it is considered a syntax error.
The last rule means that the BNF is very simple to translate to code.
The parser consumes one `unit` from an input stream every time it's
called; it returns the `datum` therein, or EOF.
```
Unit : Blank* ( Datum [Blank] | EOF )
Blank : 9...13 | Comment
Datum : OneDatum ( [JoinChar] OneDatum )*
JoinChar : '.' | ':'
Comment : ';' ( SkipUnit | SkipLine [LF] )
SkipUnit : '~' Unit
SkipLine : ( ~LF )*
OneDatum : BareString | CladDatum
BareString : BareChar+
CladDatum : '|' ( PipeStrChar | '\' StringEsc )* '|'
| '"' ( QuotStrChar | '\' StringEsc )* '"'
| '#' HashExpr
| '(' List ')' | '[' List ']' | '{' List '}'
| "'" Datum | '`' Datum | ',' Datum
BareChar : ALPHA | DIGIT
| '!' | '$' | '%' | '*' | '+' | '-' | '.' | '/'
| '<' | '=' | '>' | '?' | '@' | '^' | '_' | '~'
PipeStrChar : ~( '|' | '\' )
QuotStrChar : ~( '"' | '\' )
HashExpr : Rune [ '\' BareString | CladDatum ]
| '\' BareString
| '%' Label ( '%' | '=' Datum )
| CladDatum
List : Unit* [ '&' Unit ] Blank*
StringEsc : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )*
| 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e'
| 'x' ( HEXDIG{2} )+ ';'
| 'u' HEXDIG{1,6} ';'
Rune : ALPHA ( ALPHA | DIGIT ){0,5}
Label : HEXDIG{1,12}
```
|