summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTaylan Kammer <taylan.kammer@gmail.com>2026-01-07 13:26:51 +0100
committerTaylan Kammer <taylan.kammer@gmail.com>2026-01-07 13:26:51 +0100
commitcf2697d24c13cdc7ea5f93ce0ff5143f41a85a83 (patch)
tree7658ff8d6e758b1f63c6cbae342c87db1cfde045
parentb49af311220090c126be917993ba547cbf48bbaa (diff)
New note.
-rw-r--r--notes/260107-decoder.md218
-rw-r--r--notes/index.md1
2 files changed, 219 insertions, 0 deletions
diff --git a/notes/260107-decoder.md b/notes/260107-decoder.md
new file mode 100644
index 0000000..a1118b7
--- /dev/null
+++ b/notes/260107-decoder.md
@@ -0,0 +1,218 @@
+# Decoder
+
+_2026 January_
+
+I've mulled over this quite a bit now, and I believe I've figured out
+what kind of design I want for the "decoder" component.
+
+To recap: Zisp has a "parser" that implements an extremely bare-bones
+s-expression format (though with some interesting syntax sugar baked
+in), with a lot of the features you would expect of a typical "reader"
+being offloaded into a second pass over the data.
+
+That second pass is done by the *decoder* and will handle, among other
+things:
+
+- Number literals (the parser only knows about strings)
+
+- Boolean literals (the parser only knows about "runes")
+
+- Literals for various compound objects like vectors
+
+- Datum labels/references, for cyclic data
+
+- Emitting direct references to macros like quote, unquote, and those
+ implementing some of the more exotic syntax features like `foo.bar`
+ for field and method access, `foo:bar` for type declarations, etc.
+
+(To be clear, `foo.bar` actually becomes `(#DOT foo & bar)` at the
+parse stage, and `foo:bar` becomes `(#COLON foo & bar)` and so on.
+The decoder then substitutes `#DOT` and `#COLON` and the like for
+references to macros that actually implement the feature.)
+
+The decoder is also going to be extensible, to allow for something
+similar to reader macros in Common Lisp, but closer to regular macros
+because this extensibility will be based on runes: A list beginning
+with a rune can invoke a decoder procedure for that rune, and these
+can be user-defined.
+
+I've previously agonized over whether this means that the decoder is
+essentially the same thing as a macro expander, or rather, whether it
+would make sense to merge the functionality of the two. But I've come
+to the conclusion that this would be wrong.
+
+Key differences between the decoder and a macro expander include:
+
+- The macro expander is fully aware of bindings and lexical scope;
+ it's influenced by import statements, operates on syntax objects
+ that carry scope context, and so on. The decoder is completely
+ oblivious to identifier bindings and doesn't understand scoping.
+ For example, there's nothing like `let-syntax` for the decoder.
+
+- The macro expander only calls a macro when the head of a list is an
+ identifier bound to a syntax transformer. The decoder walks through
+ lists and checks for runes everywhere; otherwise the following would
+ not work as expected:
+
+ ```scheme
+ ;; Alist with vectors as the values.
+ ((x & #(a b))
+ (y & #(c d)))
+ ```
+
+ The parser will turn the entries into `(x #HASH a b)` and the like,
+ since `#(a b)` is sugar for `(#HASH a b)` and `(x & (#HASH a b))` is
+ equivalent to `(x #HASH a b)`. So, to make this work, the decoder
+ checks every single pair in a list and invokes a transformer if the
+ `car` of that pair is a rune bound to a decoder rule.
+
+These differences not only mean that the implementation will be quite
+different, but also that the decoder is conceptually a very different
+thing. No doubt there will be some similarity in their algorithms,
+but the conceptual simplicity of the decoder (no notion of scope or
+identifier bindings) means that you can reason about what it will do
+to source files much more easily.
+
+Macros in Scheme have a completely different "feel" to them. They're
+really part of the program logic. The whole point of hygienic macros
+is that they fit in seamlessly with the rest of your program, rather
+than being a disjoint pre-processor operating outside program logic.
+That's valuable in a different way. (Zisp will also support hygienic
+macros like Scheme.)
+
+Although the decoder is not as smart as a macro expander, I still
+intend to make it fairly powerful, supporting:
+
+- `(#IMPORT ...)` to import additional decoder rules dynamically, so
+ you could have something akin to a library of decoder extensions.
+ Yes, I know: It's ironic to list the decoder's lack of awareness of
+ imports as a key difference from the expander, and then make it
+ support its own import mechanism. But it's not the same. Regular
+ imports will be allowed within lexical scopes; decoder imports are
+ top-level only.
+
+- `(#DEFINE ...)` to dynamically add a decoder rule on the spot.
+ Again, not like a regular define: Top-level only, and unaware of
+ surrounding bindings. The decoder procedures defined in this way
+ will run in a pristine standard environment, though they can use
+ regular imports within their body to call to external code.
+
+- `(#STRING ...)` to embed the contents of a file as a string literal,
+ similar to `@embedFile()` in Zig.
+
+- `(#PARSE ...)` to parse a single expression from a file and put it
+ into this position. (Error if file contains more expressions.)
+
+- `(#SPLICE ...)` to parse all expressions in a file and splice them
+ into this position. (Essentially, `#include` from C, but obviously
+ not meant to be used like in C.)
+
+These will be turned off by default, so a decoded file cannot run
+arbitrary code, or maliciously embed `/dev/random`! The standard
+"Zisp code decoder" configuration used to read program and library
+files will then enable these features.
+
+Splicing could be used for the same effect as an import, but import
+makes it explicit that no expressions are being inserted. Files with
+decoder rules could also be compiled into a binary, which the import
+mechanism could locate and use, instead of parsing the source file
+again every time.
+
+Here's some imaginary Zisp source files demonstrating decoder use:
+
+```scheme
+;; a.zisp
+
+(#IMPORT "ht.zisp") ;may load compiled code of ht.zisp from a cache
+
+(define my-hash-table #ht((a 1) (b 2))) ;#ht imported from ht.zisp
+
+(#DEFINE (#foo x y)
+ (import (de tkammer my-helper-module))
+ (let ((blah (frobnicate x))
+ (blub (quiblify y)))
+ `(foo bar ,(generate blah blub))))
+
+(#foo x y z) ;decoder error
+
+(#foo x y) ;proper use
+
+(a b #foo x y) ;also works, but don't do it please
+
+(#DEFINE #bar '(+ 1 2))
+
+(import (zisp io)) ;imports the print function
+
+(print #bar) ;will print 3
+
+(#SPLICE "b.zisp")
+```
+
+```scheme
+;; ht.zisp
+
+(#DEFINE (#ht & entries)
+ (define ht (make-hash-table))
+ (loop ((key value) entries)
+ (ht.set key value))
+ ht)
+```
+
+```scheme
+;; b.zisp
+
+(define (foobar)
+ (let ((data (#STRING "example.data")))
+ (data.do-something)))
+
+(define cycle #%0=(1 2 3 & #%0%))
+```
+
+If you find the use of uppercase to be ugly, consider that a feature,
+because messing with the decoder this much would be discouraged. The
+only example above that actually makes some sense is the one defining
+hash table syntax.
+
+Actually, since I want to make it possible to serialize absolutely
+anything in Zisp, a regular macro could also be used to construct
+hash-table literals. See the [serialization](250210-serialize.html)
+note on that.
+
+However, this causes such custom object literals to not stand out:
+
+```scheme
+(define (foobar)
+ (let ((my-ht (ht (a 1) (b 2)))) ;doesn't look like a literal
+ (use my-ht here)))
+
+(define (foobar)
+ (let ((my-ht #ht((a 1) (b 2)))) ;more obvious that it's a literal
+ (use my-ht here)))
+```
+
+For this reason, it would be a convention that decoder rules are used
+to implement new object literal syntax, and macros used for then you
+want to output code, with hygienic bindings.
+
+```scheme
+;; Can't do this with decoder rules
+
+(import (zisp base))
+(import (de tkammer my-module))
+
+(define-syntax (my-macro x y <body>)
+ (let ((x (call something from my-module))
+ (y (also bind this one to something))
+ (foo (this is a new local identifier)))
+ (do-something-with foo)
+ ;; this could also contain a `foo` without clashing:
+ <body>))
+```
+
+Decoder rules would probably be equivalent in power to Common Lisp
+macros, but it will only be possible to bind them to runes, not to
+regular identifiers, so they will be demarcated very clearly. They
+aren't intended for the same purposes as Common Lisp macros, so the
+equal power is merely incidental. Use hygienic macros if you want
+"real" Lisp macros; decoder rules are only for superficial syntax
+enrichment, not meant to be intertwined with program logic.
diff --git a/notes/index.md b/notes/index.md
index dd5a946..5d02a60 100644
--- a/notes/index.md
+++ b/notes/index.md
@@ -23,3 +23,4 @@
* [Goals](250920-goals.html)
* [A full-stack programming language](260102-full-stack.html)
* [Simplifying S-Expression Grammar](260106-simpler-grammar.html)
+* [Decoder](260107-decoder.html)