diff options
| author | Taylan Kammer <taylan.kammer@gmail.com> | 2026-05-31 20:58:42 +0200 |
|---|---|---|
| committer | Taylan Kammer <taylan.kammer@gmail.com> | 2026-05-31 20:58:42 +0200 |
| commit | 37ff7af18cd2e896506e6d228058204525b4a6eb (patch) | |
| tree | b45e29afac99b8e6eb21f5eaf040f640221220e8 /docs/c1/grammar/abnf.txt | |
| parent | 6794e27eac3e866aa2b24999e2027b301a52ebf2 (diff) | |
More proper shebang line parsing.
Diffstat (limited to 'docs/c1/grammar/abnf.txt')
| -rw-r--r-- | docs/c1/grammar/abnf.txt | 32 |
1 files changed, 26 insertions, 6 deletions
diff --git a/docs/c1/grammar/abnf.txt b/docs/c1/grammar/abnf.txt index a5b9eca..aa67646 100644 --- a/docs/c1/grammar/abnf.txt +++ b/docs/c1/grammar/abnf.txt @@ -2,11 +2,27 @@ ; Compatible with: https://www.quut.com/abnfgen/ -; It's unclear whether this grammar is truly complete. It has been -; verified not to produce text that is rejected by the Zisp parser -; --except for Unicode escape sequences for surrogate code points-- -; but there may be some text that is accepted by the parser despite -; not being grammatical according to these rules. +; Unlike PEG, grammar rules in BNF are non-deterministic, which makes +; it much more challenging to express our naive parse logic. Whether +; this ABNF file is truly accurate is difficult to assess. + +; The abnfgen(1) tool linked above can be used to generate arbitrary +; strings matching the grammar in this file. These can be fed into +; the Zisp parser to reveal some potential bugs; either in the parser +; itself, or this ABNF grammar. + +; Note that the tool may generate Zisp string literals with Unicode +; escape sequences corresponding to surrogate code points; the parser +; may reject these. This is expected; it's difficult to rewrite this +; ABNF grammar to exclude those Unicode values. + +; Other minor inaccuracies that aren't important include: This ABNF +; forces line comments to be terminated with an LF character, when in +; fact the end-of-file may also terminate them; the same applies to +; hash-bang parsing which doesn't actually have to end in LF. These +; discrepancies won't make abnfgen(1) generate invalid strings; they +; only make this ABNF more strict than the Zisp parser, so it won't +; generate some strings that the parser would actually accept. Stream = [ Unit *( Blank Unit ) ] *Blank [Trail] @@ -52,7 +68,7 @@ RuneDotStr = "#" RuneName "\" SpecialStr RuneClad = "#" RuneName CladDatum -HashBang = "#" "!" *( SP / HTAB ) BareString +HashBang = "#" "!" *( SP / HTAB ) HBLine LF LabelRef = "#" "%" Label "%" @@ -101,6 +117,10 @@ RuneName = ALPHA *5( ALPHA / DIGIT ) Label = 1*12( HEXDIG ) +HBLine = 1*HBChar [ 1*( SP / HTAB ) *HBChar ] + +HBChar = %x00-08 / %x0b-1f / %x21-ff ; any but HT, LF, SP + RJoinDatum = CladDatum / Rune / RuneStr / RuneDotStr / RuneClad / LabelRef / LabelDef / HashStr / HashDotStr / HashClad |
