From 37ff7af18cd2e896506e6d228058204525b4a6eb Mon Sep 17 00:00:00 2001 From: Taylan Kammer Date: Sun, 31 May 2026 20:58:42 +0200 Subject: More proper shebang line parsing. --- docs/c1/grammar/abnf.txt | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) (limited to 'docs/c1/grammar/abnf.txt') diff --git a/docs/c1/grammar/abnf.txt b/docs/c1/grammar/abnf.txt index a5b9eca..aa67646 100644 --- a/docs/c1/grammar/abnf.txt +++ b/docs/c1/grammar/abnf.txt @@ -2,11 +2,27 @@ ; Compatible with: https://www.quut.com/abnfgen/ -; It's unclear whether this grammar is truly complete. It has been -; verified not to produce text that is rejected by the Zisp parser -; --except for Unicode escape sequences for surrogate code points-- -; but there may be some text that is accepted by the parser despite -; not being grammatical according to these rules. +; Unlike PEG, grammar rules in BNF are non-deterministic, which makes +; it much more challenging to express our naive parse logic. Whether +; this ABNF file is truly accurate is difficult to assess. + +; The abnfgen(1) tool linked above can be used to generate arbitrary +; strings matching the grammar in this file. These can be fed into +; the Zisp parser to reveal some potential bugs; either in the parser +; itself, or this ABNF grammar. + +; Note that the tool may generate Zisp string literals with Unicode +; escape sequences corresponding to surrogate code points; the parser +; may reject these. This is expected; it's difficult to rewrite this +; ABNF grammar to exclude those Unicode values. + +; Other minor inaccuracies that aren't important include: This ABNF +; forces line comments to be terminated with an LF character, when in +; fact the end-of-file may also terminate them; the same applies to +; hash-bang parsing which doesn't actually have to end in LF. These +; discrepancies won't make abnfgen(1) generate invalid strings; they +; only make this ABNF more strict than the Zisp parser, so it won't +; generate some strings that the parser would actually accept. Stream = [ Unit *( Blank Unit ) ] *Blank [Trail] @@ -52,7 +68,7 @@ RuneDotStr = "#" RuneName "\" SpecialStr RuneClad = "#" RuneName CladDatum -HashBang = "#" "!" *( SP / HTAB ) BareString +HashBang = "#" "!" *( SP / HTAB ) HBLine LF LabelRef = "#" "%" Label "%" @@ -101,6 +117,10 @@ RuneName = ALPHA *5( ALPHA / DIGIT ) Label = 1*12( HEXDIG ) +HBLine = 1*HBChar [ 1*( SP / HTAB ) *HBChar ] + +HBChar = %x00-08 / %x0b-1f / %x21-ff ; any but HT, LF, SP + RJoinDatum = CladDatum / Rune / RuneStr / RuneDotStr / RuneClad / LabelRef / LabelDef / HashStr / HashDotStr / HashClad -- cgit v1.2.3