More proper shebang line parsing.

author: Taylan Kammer <taylan.kammer@gmail.com> 2026-05-31 20:58:42 +0200
committer: Taylan Kammer <taylan.kammer@gmail.com> 2026-05-31 20:58:42 +0200
commit: 37ff7af18cd2e896506e6d228058204525b4a6eb (patch)
tree: b45e29afac99b8e6eb21f5eaf040f640221220e8 /docs/c1/grammar/abnf.txt
parent: 6794e27eac3e866aa2b24999e2027b301a52ebf2 (diff)
1 files changed, 26 insertions, 6 deletions
diff --git a/docs/c1/grammar/abnf.txt b/docs/c1/grammar/abnf.txt
index a5b9eca..aa67646 100644
--- a/docs/c1/grammar/abnf.txt
+++ b/docs/c1/grammar/abnf.txt
@@ -2,11 +2,27 @@
 
 ; Compatible with: https://www.quut.com/abnfgen/
 
-; It's unclear whether this grammar is truly complete.  It has been
-; verified not to produce text that is rejected by the Zisp parser
-; --except for Unicode escape sequences for surrogate code points--
-; but there may be some text that is accepted by the parser despite
-; not being grammatical according to these rules.
+; Unlike PEG, grammar rules in BNF are non-deterministic, which makes
+; it much more challenging to express our naive parse logic.  Whether
+; this ABNF file is truly accurate is difficult to assess.
+
+; The abnfgen(1) tool linked above can be used to generate arbitrary
+; strings matching the grammar in this file.  These can be fed into
+; the Zisp parser to reveal some potential bugs; either in the parser
+; itself, or this ABNF grammar.
+
+; Note that the tool may generate Zisp string literals with Unicode
+; escape sequences corresponding to surrogate code points; the parser
+; may reject these.  This is expected; it's difficult to rewrite this
+; ABNF grammar to exclude those Unicode values.
+
+; Other minor inaccuracies that aren't important include: This ABNF
+; forces line comments to be terminated with an LF character, when in
+; fact the end-of-file may also terminate them; the same applies to
+; hash-bang parsing which doesn't actually have to end in LF.  These
+; discrepancies won't make abnfgen(1) generate invalid strings; they
+; only make this ABNF more strict than the Zisp parser, so it won't
+; generate some strings that the parser would actually accept.
 
 
 Stream        = [ Unit *( Blank Unit ) ] *Blank [Trail]
@@ -52,7 +68,7 @@ RuneDotStr    = "#" RuneName "\" SpecialStr
 
 RuneClad      = "#" RuneName CladDatum
 
-HashBang      = "#" "!" *( SP / HTAB ) BareString
+HashBang      = "#" "!" *( SP / HTAB ) HBLine LF
 
 LabelRef      = "#" "%" Label "%"
 
@@ -101,6 +117,10 @@ RuneName      = ALPHA *5( ALPHA / DIGIT )
 
 Label         = 1*12( HEXDIG )
 
+HBLine        = 1*HBChar [ 1*( SP / HTAB ) *HBChar ]
+
+HBChar        = %x00-08 / %x0b-1f / %x21-ff ; any but HT, LF, SP
+
 
 RJoinDatum    = CladDatum / Rune / RuneStr / RuneDotStr / RuneClad
               / LabelRef / LabelDef / HashStr / HashDotStr / HashClad
author	Taylan Kammer <taylan.kammer@gmail.com>	2026-05-31 20:58:42 +0200
committer	Taylan Kammer <taylan.kammer@gmail.com>	2026-05-31 20:58:42 +0200
commit	37ff7af18cd2e896506e6d228058204525b4a6eb (patch)
tree	b45e29afac99b8e6eb21f5eaf040f640221220e8 /docs/c1/grammar/abnf.txt
parent	6794e27eac3e866aa2b24999e2027b301a52ebf2 (diff)