summaryrefslogtreecommitdiff
path: root/spec/syntax.md
blob: affa7a1beb8cf078094216288f2c68cf8b278027 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# Zisp S-Expression Syntax

We use a BNF notation with the following rules:

* Concatenation of expressions is implicit: `foo bar` means `foo`
  followed by `bar`.

* Expressions may be followed by `?`, `*`, `+`, `{N}`, or `{N,M}`,
  which have meanings analogous to regular expressions.

* The syntax `[foo]` is shorthand for `(foo)?`.

* The syntax is defined in terms of bytes, not characters.  Terminals
  `'c'` and `"c"` refer to the ASCII value of the given character `c`.
  Numbers are in decimal and refer to a byte with the given value.

* The `~` prefix means NOT.  It only applies to rules that match one
  byte, and negates them.  For example, `~( 'a' | 'b' )` matches any
  byte other than 97 and 98.

* Ranges of terminal values are expressed as `x...y` (inclusive).

* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported, with the
  addition of EOF to explicitly demarcate the end of the byte stream.

* There is no ambiguity, backtracking, or look-ahead beyond one byte.
  Rules match left to right, depth-first, and greedy.  As soon as the
  input matches the first terminal of a rule, it must match that rule
  to the end or it is considered a syntax error.

The last rule means that the BNF is very simple to translate to code.

The parser consumes one `unit` from an input stream every time it's
called; it returns the `datum` therein, or EOF.

```
Unit          : Blank* ( Datum [Blank] | EOF )


Blank         : 9...13 | Comment

Datum         : OneDatum ( [JoinChar] OneDatum )*

JoinChar      : '.' | ':'


Comment       : ';' ( SkipUnit | SkipLine [LF] )

SkipUnit      : '~' Unit

SkipLine      : ( ~LF )*


OneDatum      : BareString | CladDatum

BareString    : BareChar+

CladDatum     : '|' ( PipeStrChar | '\' StringEsc )* '|'
              | '"' ( QuotStrChar | '\' StringEsc )* '"'
              | '#' HashExpr
              | '(' List ')' | '[' List ']' | '{' List '}'
              | "'" Datum | '`' Datum | ',' Datum


BareChar      : ALPHA | DIGIT
              | '!' | '$' | '%' | '*' | '+' | '-' | '.' | '/'
              | '<' | '=' | '>' | '?' | '@' | '^' | '_' | '~'


PipeStrChar   : ~( '|' | '\' )

QuotStrChar   : ~( '"' | '\' )

HashExpr      : Rune [ '\' BareString | CladDatum ]
              | '\' BareString
              | '%' Label ( '%' | '=' Datum )
              | CladDatum

List          : Unit* [ '&' Unit ] Blank*


StringEsc     : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )*
              | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e'
              | 'x' ( HEXDIG{2} )+ ';'
              | 'u' HEXDIG{1,6} ';'


Rune          : ALPHA ( ALPHA | DIGIT ){0,5}

Label         : HEXDIG{1,12}
```