summaryrefslogtreecommitdiff
path: root/spec/syntax.md
diff options
context:
space:
mode:
authorTaylan Kammer <taylan.kammer@gmail.com>2026-01-06 01:17:18 +0100
committerTaylan Kammer <taylan.kammer@gmail.com>2026-01-06 01:17:18 +0100
commit94521a2cf4dfe82bc67a2998013cf6bed7c86869 (patch)
tree4cca190a365d8083814814253ee68014dc81944d /spec/syntax.md
parent8b7ead9404281379558927e30bc3241708b31523 (diff)
Update reader note and spec/syntax.md.
Diffstat (limited to 'spec/syntax.md')
-rw-r--r--spec/syntax.md68
1 files changed, 63 insertions, 5 deletions
diff --git a/spec/syntax.md b/spec/syntax.md
index b85ed78..91e5495 100644
--- a/spec/syntax.md
+++ b/spec/syntax.md
@@ -6,7 +6,9 @@ We use a BNF notation with the following rules:
followed by `bar`.
* Expressions may be followed by `?`, `*`, `+`, `{N}`, or `{N,M}`,
- which have the meanings they have in regular expressions.
+ which have meanings analogous to regular expressions.
+
+* The syntax `[foo]` is shorthand for `(foo)?`.
* The syntax is defined in terms of bytes, not characters. Terminals
`'c'` and `"c"` refer to the ASCII value of the given character `c`.
@@ -18,10 +20,13 @@ We use a BNF notation with the following rules:
* Ranges of terminal values are expressed as `x...y` (inclusive).
-* There is no ambiguity, backtracking, or look-ahead beyond the byte
- currently being matched. Rules match left to right, depth-first,
- and greedy. As soon as the input matches the first terminal of a
- rule, it must match that rule to the end.
+* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported, with the
+ addition of EOF to explicitly demarcate the end of the byte stream.
+
+* There is no ambiguity, backtracking, or look-ahead beyond one byte.
+ Rules match left to right, depth-first, and greedy. As soon as the
+ input matches the first terminal of a rule, it must match that rule
+ to the end or it is considered a syntax error.
The last rule means that the BNF is very simple to translate to code.
@@ -29,6 +34,59 @@ The parser consumes one `unit` from an input stream every time it's
called; it returns the `datum` therein, or EOF.
```
+Unit : Blank* ( Datum [Blank] | EOF )
+
+
+Blank : 9...13 | Comment
+
+Datum : OneDatum ( [JoinChar] OneDatum )*
+
+JoinChar : '.' | ':'
+
+
+Comment : ';' ( SkipUnit | SkipLine )
+
+SkipUnit : '~' Unit
+
+SkipLine : ( ~LF )* [LF]
+
+
+OneDatum : BareString | CladDatum
+
+BareString : ( '.' | '+' | '-' | DIGIT ) ( BareChar | '.' )*
+ | BareChar+
+
+CladDatum : '|' PipeStrElt* '|'
+ | '"' QuotStrElt* '"'
+ | '#' HashExpr
+ | '(' List ')' | '[' List ']' | '{' List '}'
+ | "'" Datum | '`' Datum | ',' Datum
+
+
+BareChar : ALPHA | DIGIT
+ | '!' | '$' | '%' | '&' | '*' | '+' | '-' | '/'
+ | '<' | '=' | '>' | '?' | '@' | '^' | '_' | '~'
+
+
+PipeStrElt : ~( '|' | '\' ) | '\' StringEsc
+
+QuotStrElt : ~( '"' | '\' ) | '\' StringEsc
+
+HashExpr : Rune [ '\' BareString | CladDatum ]
+ | '\' BareString
+ | '%' Label ( '%' | '=' Datum )
+ | CladDatum
+
+List : Unit* [ '.' Unit ] Blank*
+
+
+StringEsc : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )*
+ | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e'
+ | 'x' ( HEXDIG{2} )+ ';'
+ | 'u' HEXDIG{1,6} ';'
+
+Rune : ALPHA ( ALPHA | DIGIT ){0,5}
+Label : HEXDIG{1,12}
```