summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorTaylan Kammer <taylan.kammer@gmail.com>2025-03-29 23:56:22 +0100
committerTaylan Kammer <taylan.kammer@gmail.com>2025-03-29 23:56:22 +0100
commitd6e50e7a631d0dfe8d41438be89f8b00dfc9a4df (patch)
tree741717b08aafac370ce416f901c4698c62b39bfa /docs
parentfc23b42c6e2183c8ca8b6c42dc4e90d8061f835d (diff)
add some unfinished notes and docs
Diffstat (limited to 'docs')
-rw-r--r--docs/decoder.md44
-rw-r--r--docs/parser.md122
2 files changed, 166 insertions, 0 deletions
diff --git a/docs/decoder.md b/docs/decoder.md
new file mode 100644
index 0000000..0b34204
--- /dev/null
+++ b/docs/decoder.md
@@ -0,0 +1,44 @@
+# Decoding
+
+A separate process called "decoding" can transform simple data structures,
+consisting of only the datum types, into a richer set of Zisp types.
+
+For example, the decoder may turn `(#HASH ...)` into a vector, as one would
+expect a vector literal like `#(...)` to work in Scheme. Bytevector syntax
+could use a custom rune as a list prefix, like: `#u8(...)`
+
+Runes may be decoded in isolation as well, rather than transforming a list
+whose head they appear in. This can implement Boolean constants as `#true`
+and `#false` or `#t` and `#f`.
+
+The decoder recognizes `(#QUOTE ...)` to aid in implementing the traditional
+quoting mechanism of Lisp/Scheme, but with a significant difference:
+
+Traditional quote is "unhygienic" in Scheme terms. An expression such as
+`'(foo bar)` will always be read as `(quote (foo bar))` regardless of what
+lexical context it appears in, so the semantics will depend on whatever the
+identifier `quote` is bound to, meaning that the expression may end up
+evaluating to something other than the list `(foo bar)`.
+
+The Zisp decoder, which transforms not datum to datum, but object to object,
+can turn `#QUOTE` into an object which encapsulates the notion of quoting,
+which the Zisp evaluator can recognize and act upon, ensuring that an
+expression like `'(foo bar)` always turns into the list `(foo bar)`.
+
+One way to think about this, in Scheme (R6RS / syntax-case) terms, is that
+expressions like `'(foo bar)` turn directly into a syntax object when read,
+and the created syntax object begins with an identifier bound to `quote` in
+the standard library.
+
+The decoder is, of course, configurable and extensible. The transformations
+mentioned above would be performed only when it's told to decode data which
+represents Zisp code. The decoder may be given a different configuration,
+telling it to decode, for example, data which represents a different kind of
+domain-specific data, such as application settings, build system commands,
+complex data records with non-standard data types, and so on.
+
+<!--
+;; Local Variables:
+;; fill-column: 77
+;; End:
+-->
diff --git a/docs/parser.md b/docs/parser.md
new file mode 100644
index 0000000..a4e5d78
--- /dev/null
+++ b/docs/parser.md
@@ -0,0 +1,122 @@
+# Parser for Code & Data
+
+Zisp s-expressions are defined in terms of an extremely minimal set of data
+types; only that which is necessary to build representations of more complex
+expressions and data types:
+
+ +--------+-----------------+---------------+--------+----------+------+
+ | TYPE | Bare String | Quoted String | Rune | Pair | Nil |
+ +--------+-----------------+---------------+--------+----------+------+
+ | E.G. | foo, |foo bar| | "foo bar" | #name | (X . Y) | () |
+ +--------+-----------------+---------------+--------+----------+------+
+
+Bare strings and quoted strings are polymorphic sub-types of the generic
+string type. Bare strings are implicitly interned.
+
+The parser can also output non-negative integers, but this is only used for
+datum labels; number literals are handled by the decoder (see next section).
+
+The parser recognizes various "syntax sugar" and transforms it into uses of
+the above data types. The most ubiquitous example is of course the list:
+
+ (datum1 datum2 ...) -> (datum1 . (datum2 . (... . ())))
+
+The following table summarizes the other supported transformations:
+
+ #datum -> (#HASH . datum) #rune(...) -> (#rune ...)
+
+ [...] -> (#SQUARE ...) dat1dat2 -> (#JOIN dat1 . dat2)
+
+ {...} -> (#BRACE ...) dat1.dat2 -> (#DOT dat1 . dat2)
+
+ 'datum -> (#QUOTE . datum) dat1:dat2 -> (#COLON dat1 . dat2)
+
+ `datum -> (#GRAVE . datum) #%hex% -> (#LABEL . hex)
+
+ ,datum -> (#COMMA . datum) #%hex=datum -> (#LABEL hex . datum)
+
+A separate process called "decoding" can transform these objects into other
+data types. For example, `(#HASH x y z)` could become a vector, so that the
+expression `#(x y z)` works just like in Scheme. See the next section for
+details about the decoder.
+
+Decoding also resolves datum labels, and goes over bare strings to find ones
+that are actually a number literal. This lets us offload the complexity of
+number parsing elsewhere, so the parser remains extremely simple.
+
+Further notes about the syntax sugar table and examples above:
+
+* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis
+ means zero or more data; hex is a hexadecimal number of up to 12 digits.
+
+* The `#datum` form only applies when the datum following the hash sign is a
+ list, quoted string, quote expression, another expression starting with a
+ hash sign, a bare string starting with a backslash escape (see next), or a
+ pipe-quoted bare string (see next).
+
+* A backslash causes the immediately following character to lose any special
+ meaning it would have, and be considered as part of a bare string instead.
+ (This does not apply to space or control characters.) For example, the
+ following three character sequences are each a valid bare string:
+
+ foo\(bar\) \]blah \#\'xyz
+
+ Bare strings can also be "quoted" with pipes as in Scheme; it should be
+ noted that this still produces a "bare string" in terms of data type:
+
+ |foo bar baz|
+
+* Though not represented in the table due to notational difficulty, the form
+ `#rune(...)` doesn't require a list in the second position; any datum that
+ works with the `#datum` syntax also works with `#rune<DATUM>`.
+
+ #rune1#rune2 -> (#rune1 . #rune2)
+
+ #rune"text" -> (#rune . "text")
+
+ #rune\string -> (rune . string)
+
+ #rune'string -> (#rune #QUOTE . string)
+
+ As a counter-example, following a rune immediately with a bare string isn't
+ possible, since it's ambiguous:
+
+ #abcdefgh ;Could be (#abcdef . gh) or (#abcde . fgh) or ...
+
+* Syntax sugar can combine arbitrarily; some examples follow:
+
+ #{...} -> (#HASH #BRACE ...)
+
+ #'foo -> (#HASH #QUOTE . foo)
+
+ ##'[...] -> (#HASH #HASH #QUOTE #SQUARE ...)
+
+ {x y}[i j] -> (#JOIN (#BRACE x y) #SQUARE i j)
+
+ foo.bar.baz{x y} -> (#JOIN (#DOT (#DOT foo . bar) . baz) #BRACE x y)
+
+* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses
+ as `(#QUOTE . foo)` instead; the operand of `#QUOTE` is the entire cdr.
+
+ The same principle is used when parsing other sugar; some examples follow:
+
+ Incorrect Correct
+
+ #(x y z) -> (#HASH (x y z)) #(x y z) -> (#HASH x y z)
+
+ [x y z] -> (#SQUARE (x y z)) [x y z] -> (#SQUARE x y z)
+
+ #{x} -> (#HASH (#BRACE (x))) #{x} -> (#HASH #BRACE x)
+
+ foo(x y) -> (#JOIN foo (x y)) foo(x y) -> (#JOIN foo x y)
+
+* Runes are case-sensitive, and the parser only emits runes using upper-case
+ letters when expressing syntax sugar. This way, there can be no accidental
+ clash with runes that appear verbatim in code, as long as only lower-case
+ letters are used for rune literals in code.
+
+<!--
+;; Local Variables:
+;; fill-column: 77
+;; End:
+-->