From f2b18d64448ab09dd5e5e6a180d38d90d5aaf367 Mon Sep 17 00:00:00 2001 From: Taylan Kammer Date: Thu, 27 Mar 2025 21:18:09 +0100 Subject: new parser --- spec/syntax.md | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 spec/syntax.md (limited to 'spec/syntax.md') diff --git a/spec/syntax.md b/spec/syntax.md new file mode 100644 index 0000000..b85ed78 --- /dev/null +++ b/spec/syntax.md @@ -0,0 +1,34 @@ +# Zisp S-Expression Syntax + +We use a BNF notation with the following rules: + +* Concatenation of expressions is implicit: `foo bar` means `foo` + followed by `bar`. + +* Expressions may be followed by `?`, `*`, `+`, `{N}`, or `{N,M}`, + which have the meanings they have in regular expressions. + +* The syntax is defined in terms of bytes, not characters. Terminals + `'c'` and `"c"` refer to the ASCII value of the given character `c`. + Numbers are in decimal and refer to a byte with the given value. + +* The `~` prefix means NOT. It only applies to rules that match one + byte, and negates them. For example, `~( 'a' | 'b' )` matches any + byte other than 97 and 98. + +* Ranges of terminal values are expressed as `x...y` (inclusive). + +* There is no ambiguity, backtracking, or look-ahead beyond the byte + currently being matched. Rules match left to right, depth-first, + and greedy. As soon as the input matches the first terminal of a + rule, it must match that rule to the end. + +The last rule means that the BNF is very simple to translate to code. + +The parser consumes one `unit` from an input stream every time it's +called; it returns the `datum` therein, or EOF. + +``` + + +``` -- cgit v1.2.3