From 94521a2cf4dfe82bc67a2998013cf6bed7c86869 Mon Sep 17 00:00:00 2001 From: Taylan Kammer Date: Tue, 6 Jan 2026 01:17:18 +0100 Subject: Update reader note and spec/syntax.md. --- spec/syntax.md | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 63 insertions(+), 5 deletions(-) (limited to 'spec') diff --git a/spec/syntax.md b/spec/syntax.md index b85ed78..91e5495 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -6,7 +6,9 @@ We use a BNF notation with the following rules: followed by `bar`. * Expressions may be followed by `?`, `*`, `+`, `{N}`, or `{N,M}`, - which have the meanings they have in regular expressions. + which have meanings analogous to regular expressions. + +* The syntax `[foo]` is shorthand for `(foo)?`. * The syntax is defined in terms of bytes, not characters. Terminals `'c'` and `"c"` refer to the ASCII value of the given character `c`. @@ -18,10 +20,13 @@ We use a BNF notation with the following rules: * Ranges of terminal values are expressed as `x...y` (inclusive). -* There is no ambiguity, backtracking, or look-ahead beyond the byte - currently being matched. Rules match left to right, depth-first, - and greedy. As soon as the input matches the first terminal of a - rule, it must match that rule to the end. +* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported, with the + addition of EOF to explicitly demarcate the end of the byte stream. + +* There is no ambiguity, backtracking, or look-ahead beyond one byte. + Rules match left to right, depth-first, and greedy. As soon as the + input matches the first terminal of a rule, it must match that rule + to the end or it is considered a syntax error. The last rule means that the BNF is very simple to translate to code. @@ -29,6 +34,59 @@ The parser consumes one `unit` from an input stream every time it's called; it returns the `datum` therein, or EOF. ``` +Unit : Blank* ( Datum [Blank] | EOF ) + + +Blank : 9...13 | Comment + +Datum : OneDatum ( [JoinChar] OneDatum )* + +JoinChar : '.' | ':' + + +Comment : ';' ( SkipUnit | SkipLine ) + +SkipUnit : '~' Unit + +SkipLine : ( ~LF )* [LF] + + +OneDatum : BareString | CladDatum + +BareString : ( '.' | '+' | '-' | DIGIT ) ( BareChar | '.' )* + | BareChar+ + +CladDatum : '|' PipeStrElt* '|' + | '"' QuotStrElt* '"' + | '#' HashExpr + | '(' List ')' | '[' List ']' | '{' List '}' + | "'" Datum | '`' Datum | ',' Datum + + +BareChar : ALPHA | DIGIT + | '!' | '$' | '%' | '&' | '*' | '+' | '-' | '/' + | '<' | '=' | '>' | '?' | '@' | '^' | '_' | '~' + + +PipeStrElt : ~( '|' | '\' ) | '\' StringEsc + +QuotStrElt : ~( '"' | '\' ) | '\' StringEsc + +HashExpr : Rune [ '\' BareString | CladDatum ] + | '\' BareString + | '%' Label ( '%' | '=' Datum ) + | CladDatum + +List : Unit* [ '.' Unit ] Blank* + + +StringEsc : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )* + | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e' + | 'x' ( HEXDIG{2} )+ ';' + | 'u' HEXDIG{1,6} ';' + +Rune : ALPHA ( ALPHA | DIGIT ){0,5} +Label : HEXDIG{1,12} ``` -- cgit v1.2.3