diff options
| author | Taylan Kammer <taylan.kammer@gmail.com> | 2026-01-06 01:17:18 +0100 |
|---|---|---|
| committer | Taylan Kammer <taylan.kammer@gmail.com> | 2026-01-06 01:17:18 +0100 |
| commit | 94521a2cf4dfe82bc67a2998013cf6bed7c86869 (patch) | |
| tree | 4cca190a365d8083814814253ee68014dc81944d /spec | |
| parent | 8b7ead9404281379558927e30bc3241708b31523 (diff) | |
Update reader note and spec/syntax.md.
Diffstat (limited to 'spec')
| -rw-r--r-- | spec/syntax.md | 68 |
1 files changed, 63 insertions, 5 deletions
diff --git a/spec/syntax.md b/spec/syntax.md index b85ed78..91e5495 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -6,7 +6,9 @@ We use a BNF notation with the following rules: followed by `bar`. * Expressions may be followed by `?`, `*`, `+`, `{N}`, or `{N,M}`, - which have the meanings they have in regular expressions. + which have meanings analogous to regular expressions. + +* The syntax `[foo]` is shorthand for `(foo)?`. * The syntax is defined in terms of bytes, not characters. Terminals `'c'` and `"c"` refer to the ASCII value of the given character `c`. @@ -18,10 +20,13 @@ We use a BNF notation with the following rules: * Ranges of terminal values are expressed as `x...y` (inclusive). -* There is no ambiguity, backtracking, or look-ahead beyond the byte - currently being matched. Rules match left to right, depth-first, - and greedy. As soon as the input matches the first terminal of a - rule, it must match that rule to the end. +* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported, with the + addition of EOF to explicitly demarcate the end of the byte stream. + +* There is no ambiguity, backtracking, or look-ahead beyond one byte. + Rules match left to right, depth-first, and greedy. As soon as the + input matches the first terminal of a rule, it must match that rule + to the end or it is considered a syntax error. The last rule means that the BNF is very simple to translate to code. @@ -29,6 +34,59 @@ The parser consumes one `unit` from an input stream every time it's called; it returns the `datum` therein, or EOF. ``` +Unit : Blank* ( Datum [Blank] | EOF ) + + +Blank : 9...13 | Comment + +Datum : OneDatum ( [JoinChar] OneDatum )* + +JoinChar : '.' | ':' + + +Comment : ';' ( SkipUnit | SkipLine ) + +SkipUnit : '~' Unit + +SkipLine : ( ~LF )* [LF] + + +OneDatum : BareString | CladDatum + +BareString : ( '.' | '+' | '-' | DIGIT ) ( BareChar | '.' )* + | BareChar+ + +CladDatum : '|' PipeStrElt* '|' + | '"' QuotStrElt* '"' + | '#' HashExpr + | '(' List ')' | '[' List ']' | '{' List '}' + | "'" Datum | '`' Datum | ',' Datum + + +BareChar : ALPHA | DIGIT + | '!' | '$' | '%' | '&' | '*' | '+' | '-' | '/' + | '<' | '=' | '>' | '?' | '@' | '^' | '_' | '~' + + +PipeStrElt : ~( '|' | '\' ) | '\' StringEsc + +QuotStrElt : ~( '"' | '\' ) | '\' StringEsc + +HashExpr : Rune [ '\' BareString | CladDatum ] + | '\' BareString + | '%' Label ( '%' | '=' Datum ) + | CladDatum + +List : Unit* [ '.' Unit ] Blank* + + +StringEsc : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )* + | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e' + | 'x' ( HEXDIG{2} )+ ';' + | 'u' HEXDIG{1,6} ';' + +Rune : ALPHA ( ALPHA | DIGIT ){0,5} +Label : HEXDIG{1,12} ``` |
