summaryrefslogtreecommitdiff
path: root/docs/c1/grammar.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/c1/grammar.md')
-rw-r--r--docs/c1/grammar.md101
1 files changed, 0 insertions, 101 deletions
diff --git a/docs/c1/grammar.md b/docs/c1/grammar.md
deleted file mode 100644
index 3364150..0000000
--- a/docs/c1/grammar.md
+++ /dev/null
@@ -1,101 +0,0 @@
-# Zisp S-Expression Grammar
-
-The grammar is available in several different formats:
-
-* [ZBNF](grammar.zbnf.txt): See below for the rules of this notation
-* [ABNF](grammar.abnf.txt): Compatible with the `abnfgen` tool
-* [PEG](grammar.peg.txt): Compatible with `peg/leg` tool
-
-
-## ZBNF notation
-
-The ZBNF grammar specification uses a BNF-like notation with PEG-like
-semantics:
-
-* Concatenation of expressions is implicit: `foo bar` means `foo`
- followed by `bar`.
-
-* Parentheses are used for grouping, and the pipe symbol `|` is used
- for alternatives.
-
-* The suffixes `?`, `*`, and `+` have the same meaning as in regular
- expressions, although `[foo]` is used in place of `(foo)?`.
-
-* The syntax is defined in terms of bytes, not characters. Terminals
- `'c'` and `"c"` refer to the ASCII value of the given character `c`.
- Standard C escape sequences are supported.
-
-* The prefix `~` means NOT. It only applies to rules that match one
- byte, and negates them. For example, `~( 'a' | 'b' )` matches any
- byte other than 'a' and 'b'.
-
-* Ranges of terminal values are expressed as `x...y` (inclusive).
-
-* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported.
-
-* There is no ambiguity, or look-ahead / backtracking beyond one byte.
- Rules match left to right, depth-first, and greedy. As soon as the
- input matches the first terminal of a rule --explicit or implied by
- recursively descending into the first non-terminal-- it must match
- that rule to the end or a syntax error is reported.
-
-The last point makes the notation simple to translate to code.
-
-
-## Limitations outside the grammar
-
-The following limits are not represented in the grammar:
-
-* A `UnicodeSV` is the hexadecimal representation of a Unicode scalar
- value; it must represent a value in the range 0 to D7FF, or E000 to
- 10FFFF, inclusive. Any other value signals an error. Valid values
- are converted into a UTF-8 byte sequence encoding the value.
-
-* A `Rune` longer than 6 bytes is grammatical, but signals an error.
- This is important because runes are not self-terminating; defining
- their grammar as ending after a maximum of 6 bytes would allow
- another datum beginning with an alphabetic character to follow a
- rune immediately without any visual delineation, which would be
- terribly confusing for a human reader. Consider: `#foobarbaz`.
- This would parse as a `Datum` joining `#foobar` and `baz`.
-
-* A `Label` is the hexadecimal representation of a 48-bit integer,
- meaning it allows for a maximum of 12 hexadecimal digits. Longer
- values are grammatical, but signal an out-of-range error, so as to
- avoid signaling a confusing "invalid character" error on input that
- appears grammatical. Consider: `#%123456789abcd=foo`. This would
- signal an invalid character error at the letter `d` if the grammar
- limited a `Label` to 12 hexadecimal digits.
-
-
-## Stream-parsing strategy
-
-The parser consumes one `Unit` from the input stream every time it's
-called; it returns the `Datum` therein if found, or else it returns
-the Zisp EOF token.
-
-Since a `Datum` is not self-terminating, the parser must read beyond
-it to realize that it has ended (if not followed by the EOF). Thus,
-it will consume one more `Blank` following the `Unit` that it parsed.
-If this `Blank` is a comment, it will be consumed entirely, ensuring
-that parsing resumes properly on a subsequent parser call on the same
-input stream, without needing to store any state in between.
-
-Since comments of type `SkipUnit` are likewise not self-terminating,
-an arbitrary number of chained `SkipUnit` comments may need to be
-consumed before the parser is finally allowed to return.
-
-The following illustration shows the positions at which the parser
-will stop consuming input when called repeatedly on the same input
-stream. The dots represent the extent of each `Unit` being parsed,
-while the caret points at the last byte the parser will consume in
-that parse cycle.
-
-```
-foo (bar)[baz] foo;~bar foo;~bar;~baz;~bat foobar
-...^..........^... ^... ^......^
-```
-
-Notice how, in the fourth cycle, the parser is forced to consume all
-commented-out units before it can return, since it would otherwise
-leave the stream in an inappropriate state.