From 2f77b3ceaa2989d944296c572a07b2caee39d9d4 Mon Sep 17 00:00:00 2001 From: Taylan Kammer Date: Mon, 12 Jan 2026 08:03:38 +0100 Subject: Update HTML stuff. --- docs/c1/grammar.md | 101 ----------------------------------------------------- 1 file changed, 101 deletions(-) delete mode 100644 docs/c1/grammar.md (limited to 'docs/c1/grammar.md') diff --git a/docs/c1/grammar.md b/docs/c1/grammar.md deleted file mode 100644 index 3364150..0000000 --- a/docs/c1/grammar.md +++ /dev/null @@ -1,101 +0,0 @@ -# Zisp S-Expression Grammar - -The grammar is available in several different formats: - -* [ZBNF](grammar.zbnf.txt): See below for the rules of this notation -* [ABNF](grammar.abnf.txt): Compatible with the `abnfgen` tool -* [PEG](grammar.peg.txt): Compatible with `peg/leg` tool - - -## ZBNF notation - -The ZBNF grammar specification uses a BNF-like notation with PEG-like -semantics: - -* Concatenation of expressions is implicit: `foo bar` means `foo` - followed by `bar`. - -* Parentheses are used for grouping, and the pipe symbol `|` is used - for alternatives. - -* The suffixes `?`, `*`, and `+` have the same meaning as in regular - expressions, although `[foo]` is used in place of `(foo)?`. - -* The syntax is defined in terms of bytes, not characters. Terminals - `'c'` and `"c"` refer to the ASCII value of the given character `c`. - Standard C escape sequences are supported. - -* The prefix `~` means NOT. It only applies to rules that match one - byte, and negates them. For example, `~( 'a' | 'b' )` matches any - byte other than 'a' and 'b'. - -* Ranges of terminal values are expressed as `x...y` (inclusive). - -* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported. - -* There is no ambiguity, or look-ahead / backtracking beyond one byte. - Rules match left to right, depth-first, and greedy. As soon as the - input matches the first terminal of a rule --explicit or implied by - recursively descending into the first non-terminal-- it must match - that rule to the end or a syntax error is reported. - -The last point makes the notation simple to translate to code. - - -## Limitations outside the grammar - -The following limits are not represented in the grammar: - -* A `UnicodeSV` is the hexadecimal representation of a Unicode scalar - value; it must represent a value in the range 0 to D7FF, or E000 to - 10FFFF, inclusive. Any other value signals an error. Valid values - are converted into a UTF-8 byte sequence encoding the value. - -* A `Rune` longer than 6 bytes is grammatical, but signals an error. - This is important because runes are not self-terminating; defining - their grammar as ending after a maximum of 6 bytes would allow - another datum beginning with an alphabetic character to follow a - rune immediately without any visual delineation, which would be - terribly confusing for a human reader. Consider: `#foobarbaz`. - This would parse as a `Datum` joining `#foobar` and `baz`. - -* A `Label` is the hexadecimal representation of a 48-bit integer, - meaning it allows for a maximum of 12 hexadecimal digits. Longer - values are grammatical, but signal an out-of-range error, so as to - avoid signaling a confusing "invalid character" error on input that - appears grammatical. Consider: `#%123456789abcd=foo`. This would - signal an invalid character error at the letter `d` if the grammar - limited a `Label` to 12 hexadecimal digits. - - -## Stream-parsing strategy - -The parser consumes one `Unit` from the input stream every time it's -called; it returns the `Datum` therein if found, or else it returns -the Zisp EOF token. - -Since a `Datum` is not self-terminating, the parser must read beyond -it to realize that it has ended (if not followed by the EOF). Thus, -it will consume one more `Blank` following the `Unit` that it parsed. -If this `Blank` is a comment, it will be consumed entirely, ensuring -that parsing resumes properly on a subsequent parser call on the same -input stream, without needing to store any state in between. - -Since comments of type `SkipUnit` are likewise not self-terminating, -an arbitrary number of chained `SkipUnit` comments may need to be -consumed before the parser is finally allowed to return. - -The following illustration shows the positions at which the parser -will stop consuming input when called repeatedly on the same input -stream. The dots represent the extent of each `Unit` being parsed, -while the caret points at the last byte the parser will consume in -that parse cycle. - -``` -foo (bar)[baz] foo;~bar foo;~bar;~baz;~bat foobar -...^..........^... ^... ^......^ -``` - -Notice how, in the fourth cycle, the parser is forced to consume all -commented-out units before it can return, since it would otherwise -leave the stream in an inappropriate state. -- cgit v1.2.3