diff options
Diffstat (limited to 'spec/syntax.md')
| -rw-r--r-- | spec/syntax.md | 117 |
1 files changed, 0 insertions, 117 deletions
diff --git a/spec/syntax.md b/spec/syntax.md deleted file mode 100644 index d1a17ad..0000000 --- a/spec/syntax.md +++ /dev/null @@ -1,117 +0,0 @@ -# Zisp S-Expression Syntax - -We use a BNF-like grammar notation with the following rules: - -* Concatenation of expressions is implicit: `foo bar` means `foo` - followed by `bar`. - -* The suffixes `?`, `*`, and `+` have the same meaning as in regular - expressions, although `[foo]` is used in place of `(foo)?`. - -* The syntax is defined in terms of bytes, not characters. Terminals - `'c'` and `"c"` refer to the ASCII value of the given character `c`. - Numbers are in decimal and refer to a byte with the given value. - -* The prefix `~` means NOT. It only applies to rules that match one - byte, and negates them. For example, `~( 'a' | 'b' )` matches any - byte other than 97 and 98. - -* Ranges of terminal values are expressed as `x...y` (inclusive). - -* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported. - -* There is no ambiguity, or look-ahead / backtracking beyond one byte. - Rules match left to right, depth-first, and greedy. As soon as the - input matches the first terminal of a rule (explicit or implied by - recursively descending into the first non-terminal), it must match - that rule to the end, or it is considered a syntax error. - -The last rule means that the notation is simple to translate to code. -It ostensibly makes the notation equivalent to PEG in expression. - -The parser consumes one `Unit` from an input stream every time it's -called; it returns the `Datum` therein, or EOF. The final optional -`Blank` represents the fact that the parser will consume one more -blank at the end if it finds one; this is because `Datum` is not -self-closing so the parser has to check if it goes on. - -The following limits are not represented in the grammar: - -* A `UnicodeSV` is the hexadecimal representation of a Unicode scalar - value; it must represent a value in the range 0 to D7FF, or E000 to - 10FFFF, inclusive. Any other value signals an error. Valid values - are converted into a UTF-8 byte sequence encoding the value. - -* A `Rune` longer than 6 bytes is grammatical, but signals an error. - This is important because runes are not self-terminating; defining - their grammar as ending after a maximum of 6 bytes would allow - another datum beginning with an alphabetic character to follow a - rune immediately without any visual delineation, which would be - terribly confusing for a human reader. Consider: `#foo123bar`. - This would parse as a concatenation of `#foo123` and `bar`. - -* A `Label` is the hexadecimal representation of a 48-bit integer, - meaning it allows for a maximum of 12 hexadecimal digits. Longer - values are grammatical, but signal an out-of-range error. - -``` -Unit : Blank* [ Datum [Blank] ] - - -Blank : 9...13 | SP | Comment - -Datum : OneDatum ( [JoinChar] OneDatum )* - -JoinChar : '.' | ':' - - -Comment : ';' ( SkipUnit | SkipLine ) - -SkipUnit : '~' Unit - -SkipLine : ( ~LF )* [LF] - - -OneDatum : BareString | CladDatum - - -BareString : ( '.' | '+' | '-' | DIGIT ) ( BareChar | '.' )* - | BareChar+ - -CladDatum : PipeStr | QuoteStr | HashExpr | QuoteExpr | List - -PipeStr : '|' ( PipeStrChar | '\' StringEsc )* '|' -QuoteStr : '"' ( QuotStrChar | '\' StringEsc )* '"' -HashExpr : '#' ( RuneExpr | LabelExpr | HashDatum ) -QuoteExpr : "'" Datum | '`' Datum | ',' Datum -List : ParenList | SquareList | BraceList - -BareChar : ALPHA | DIGIT - | '!' | '$' | '%' | '*' | '+' - | '-' | '/' | '<' | '=' | '>' - | '?' | '@' | '^' | '_' | '~' - -PipeStrChar : ~( '|' | '\' ) -QuotStrChar : ~( '"' | '\' ) - -StringEsc : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )* - | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e' - | 'x' HexByte+ ';' - | 'u' UnicodeSV ';' - -HexByte : HEXDIG HEXDIG -UnicodeSV : HEXDIG+ - -RuneExpr : Rune [ '\' BareString | CladDatum ] -LabelExpr : '%' Label ( '%' | '=' Datum ) -HashDatum : '\' BareString | CladDatum - -Rune : ALPHA ( ALPHA | DIGIT )* -Label : HEXDIG+ - -ParenList : '(' ListBody ')' -SquareList : '[' ListBody ']' -BraceList : '{' ListBody '}' - -ListBody : Unit* [ Blank* '&' Unit ] Blank* -``` |
