summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--notes/250219-reader.md53
-rw-r--r--spec/syntax.md68
2 files changed, 115 insertions, 6 deletions
diff --git a/notes/250219-reader.md b/notes/250219-reader.md
index de71b4e..503d402 100644
--- a/notes/250219-reader.md
+++ b/notes/250219-reader.md
@@ -7,6 +7,10 @@ article:*
[Symbols are strings are symbols](250210-symbols.html)
+*This whole article is me rambling, and the actual implementation of
+the parser that I settled on is slightly different from all the ideas
+that are wildly explored here. See late addition at the bottom.*
+
OK but hear me out... What if there were different reader modes, for
code and (pure) data?
@@ -463,10 +467,57 @@ from the apostrophe if needed.)
Also, all those would work without a rune as well, to allow a file to
change the meaning of some of the default syntax sugar if desired:
- "foo" -> (#string . foo)
+ "foo" -> (#string . foo)
[foo bar] -> (#square foo bar)
{foo bar} -> (#braces foo bar)
Or something like that. I'm making this all up as I go.
+
+## Actual implementation
+
+_2026 January_
+
+Just to summarize what I actually ended up implementing in the end:
+
+- There is only one parser, not separate data and code parsers.
+
+- It simply desugars `"foo bar"` into `(#QUOTE . |foo bar|)`, i.e.,
+ these expressions are equivalent, and indistinguishable once they
+ have been parsed into data. (The syntax `|foo bar|` represents a
+ string literal in its purest form.) Another equivalent expression
+ would be `'|foo bar|` that also parses into `(#QUOTE . |foo bar|)`.
+ All three parse into the exact same data in memory.
+
+- If you want to use Zisp expressions for something like config files
+ and want to type `"foo bar"` instead of `|foo bar|` but don't want
+ to deal with `(#QUOTE . |foo bar|)` then just run a decoder on the
+ data before using it. You'll need to run a decoder on it anyway if
+ you want to support vectors, mappings, and other such data types in
+ your config file that don't have a *direct* data representation.
+
+- The decoder is not implemented yet, but it will be configurable and
+ may have default configurations for "code" and "data" where the data
+ configuration would presumably just strip `(#QUOTE . foo)` down to
+ `foo` just to make `"foo"` and `|foo|` totally equivalent in data
+ contexts like config files. In the code configuration, it would
+ decode `(#QUOTE . foo)` into a macro call expression object which,
+ when evaluated, results in `foo`.
+
+- If you wanted to have a config file with code snippets in it, and
+ don't want e.g. `(code (string-append "foo" x))` to be decoded into
+ `(code (string-append foo x))` thus changing the meaning of the
+ embedded code, you have two options:
+
+ 1. Make your entire config file be Zisp code written in a DSL.
+
+ 2. Wrap code snippets in one layer of quoting like `'(...)` which
+ will effectively protect nested uses of `#QUOTE` from the data
+ decoder, since decoding is a breadth-first operation.
+
+See here for full documentation of Zisp expressions as implemented:
+
+- [Informal docs](https://git.tkammer.de/zisp/tree/docs/parser.md)
+- [Formal spec](https://git.tkammer.de/zisp/tree/spec/syntax.md)
+- [ABNF](https://git.tkammer.de/zisp/tree/spec/syntax.abnf)
diff --git a/spec/syntax.md b/spec/syntax.md
index b85ed78..91e5495 100644
--- a/spec/syntax.md
+++ b/spec/syntax.md
@@ -6,7 +6,9 @@ We use a BNF notation with the following rules:
followed by `bar`.
* Expressions may be followed by `?`, `*`, `+`, `{N}`, or `{N,M}`,
- which have the meanings they have in regular expressions.
+ which have meanings analogous to regular expressions.
+
+* The syntax `[foo]` is shorthand for `(foo)?`.
* The syntax is defined in terms of bytes, not characters. Terminals
`'c'` and `"c"` refer to the ASCII value of the given character `c`.
@@ -18,10 +20,13 @@ We use a BNF notation with the following rules:
* Ranges of terminal values are expressed as `x...y` (inclusive).
-* There is no ambiguity, backtracking, or look-ahead beyond the byte
- currently being matched. Rules match left to right, depth-first,
- and greedy. As soon as the input matches the first terminal of a
- rule, it must match that rule to the end.
+* ABNF "core rules" like `ALPHA` and `HEXDIG` are supported, with the
+ addition of EOF to explicitly demarcate the end of the byte stream.
+
+* There is no ambiguity, backtracking, or look-ahead beyond one byte.
+ Rules match left to right, depth-first, and greedy. As soon as the
+ input matches the first terminal of a rule, it must match that rule
+ to the end or it is considered a syntax error.
The last rule means that the BNF is very simple to translate to code.
@@ -29,6 +34,59 @@ The parser consumes one `unit` from an input stream every time it's
called; it returns the `datum` therein, or EOF.
```
+Unit : Blank* ( Datum [Blank] | EOF )
+
+
+Blank : 9...13 | Comment
+
+Datum : OneDatum ( [JoinChar] OneDatum )*
+
+JoinChar : '.' | ':'
+
+
+Comment : ';' ( SkipUnit | SkipLine )
+
+SkipUnit : '~' Unit
+
+SkipLine : ( ~LF )* [LF]
+
+
+OneDatum : BareString | CladDatum
+
+BareString : ( '.' | '+' | '-' | DIGIT ) ( BareChar | '.' )*
+ | BareChar+
+
+CladDatum : '|' PipeStrElt* '|'
+ | '"' QuotStrElt* '"'
+ | '#' HashExpr
+ | '(' List ')' | '[' List ']' | '{' List '}'
+ | "'" Datum | '`' Datum | ',' Datum
+
+
+BareChar : ALPHA | DIGIT
+ | '!' | '$' | '%' | '&' | '*' | '+' | '-' | '/'
+ | '<' | '=' | '>' | '?' | '@' | '^' | '_' | '~'
+
+
+PipeStrElt : ~( '|' | '\' ) | '\' StringEsc
+
+QuotStrElt : ~( '"' | '\' ) | '\' StringEsc
+
+HashExpr : Rune [ '\' BareString | CladDatum ]
+ | '\' BareString
+ | '%' Label ( '%' | '=' Datum )
+ | CladDatum
+
+List : Unit* [ '.' Unit ] Blank*
+
+
+StringEsc : '\' | '|' | '"' | ( HTAB | SP )* LF ( HTAB | SP )*
+ | 'a' | 'b' | 't' | 'n' | 'v' | 'f' | 'r' | 'e'
+ | 'x' ( HEXDIG{2} )+ ';'
+ | 'u' HEXDIG{1,6} ';'
+
+Rune : ALPHA ( ALPHA | DIGIT ){0,5}
+Label : HEXDIG{1,12}
```