From cf2697d24c13cdc7ea5f93ce0ff5143f41a85a83 Mon Sep 17 00:00:00 2001 From: Taylan Kammer Date: Wed, 7 Jan 2026 13:26:51 +0100 Subject: New note. --- notes/260107-decoder.md | 218 ++++++++++++++++++++++++++++++++++++++++++++++++ notes/index.md | 1 + 2 files changed, 219 insertions(+) create mode 100644 notes/260107-decoder.md diff --git a/notes/260107-decoder.md b/notes/260107-decoder.md new file mode 100644 index 0000000..a1118b7 --- /dev/null +++ b/notes/260107-decoder.md @@ -0,0 +1,218 @@ +# Decoder + +_2026 January_ + +I've mulled over this quite a bit now, and I believe I've figured out +what kind of design I want for the "decoder" component. + +To recap: Zisp has a "parser" that implements an extremely bare-bones +s-expression format (though with some interesting syntax sugar baked +in), with a lot of the features you would expect of a typical "reader" +being offloaded into a second pass over the data. + +That second pass is done by the *decoder* and will handle, among other +things: + +- Number literals (the parser only knows about strings) + +- Boolean literals (the parser only knows about "runes") + +- Literals for various compound objects like vectors + +- Datum labels/references, for cyclic data + +- Emitting direct references to macros like quote, unquote, and those + implementing some of the more exotic syntax features like `foo.bar` + for field and method access, `foo:bar` for type declarations, etc. + +(To be clear, `foo.bar` actually becomes `(#DOT foo & bar)` at the +parse stage, and `foo:bar` becomes `(#COLON foo & bar)` and so on. +The decoder then substitutes `#DOT` and `#COLON` and the like for +references to macros that actually implement the feature.) + +The decoder is also going to be extensible, to allow for something +similar to reader macros in Common Lisp, but closer to regular macros +because this extensibility will be based on runes: A list beginning +with a rune can invoke a decoder procedure for that rune, and these +can be user-defined. + +I've previously agonized over whether this means that the decoder is +essentially the same thing as a macro expander, or rather, whether it +would make sense to merge the functionality of the two. But I've come +to the conclusion that this would be wrong. + +Key differences between the decoder and a macro expander include: + +- The macro expander is fully aware of bindings and lexical scope; + it's influenced by import statements, operates on syntax objects + that carry scope context, and so on. The decoder is completely + oblivious to identifier bindings and doesn't understand scoping. + For example, there's nothing like `let-syntax` for the decoder. + +- The macro expander only calls a macro when the head of a list is an + identifier bound to a syntax transformer. The decoder walks through + lists and checks for runes everywhere; otherwise the following would + not work as expected: + + ```scheme + ;; Alist with vectors as the values. + ((x & #(a b)) + (y & #(c d))) + ``` + + The parser will turn the entries into `(x #HASH a b)` and the like, + since `#(a b)` is sugar for `(#HASH a b)` and `(x & (#HASH a b))` is + equivalent to `(x #HASH a b)`. So, to make this work, the decoder + checks every single pair in a list and invokes a transformer if the + `car` of that pair is a rune bound to a decoder rule. + +These differences not only mean that the implementation will be quite +different, but also that the decoder is conceptually a very different +thing. No doubt there will be some similarity in their algorithms, +but the conceptual simplicity of the decoder (no notion of scope or +identifier bindings) means that you can reason about what it will do +to source files much more easily. + +Macros in Scheme have a completely different "feel" to them. They're +really part of the program logic. The whole point of hygienic macros +is that they fit in seamlessly with the rest of your program, rather +than being a disjoint pre-processor operating outside program logic. +That's valuable in a different way. (Zisp will also support hygienic +macros like Scheme.) + +Although the decoder is not as smart as a macro expander, I still +intend to make it fairly powerful, supporting: + +- `(#IMPORT ...)` to import additional decoder rules dynamically, so + you could have something akin to a library of decoder extensions. + Yes, I know: It's ironic to list the decoder's lack of awareness of + imports as a key difference from the expander, and then make it + support its own import mechanism. But it's not the same. Regular + imports will be allowed within lexical scopes; decoder imports are + top-level only. + +- `(#DEFINE ...)` to dynamically add a decoder rule on the spot. + Again, not like a regular define: Top-level only, and unaware of + surrounding bindings. The decoder procedures defined in this way + will run in a pristine standard environment, though they can use + regular imports within their body to call to external code. + +- `(#STRING ...)` to embed the contents of a file as a string literal, + similar to `@embedFile()` in Zig. + +- `(#PARSE ...)` to parse a single expression from a file and put it + into this position. (Error if file contains more expressions.) + +- `(#SPLICE ...)` to parse all expressions in a file and splice them + into this position. (Essentially, `#include` from C, but obviously + not meant to be used like in C.) + +These will be turned off by default, so a decoded file cannot run +arbitrary code, or maliciously embed `/dev/random`! The standard +"Zisp code decoder" configuration used to read program and library +files will then enable these features. + +Splicing could be used for the same effect as an import, but import +makes it explicit that no expressions are being inserted. Files with +decoder rules could also be compiled into a binary, which the import +mechanism could locate and use, instead of parsing the source file +again every time. + +Here's some imaginary Zisp source files demonstrating decoder use: + +```scheme +;; a.zisp + +(#IMPORT "ht.zisp") ;may load compiled code of ht.zisp from a cache + +(define my-hash-table #ht((a 1) (b 2))) ;#ht imported from ht.zisp + +(#DEFINE (#foo x y) + (import (de tkammer my-helper-module)) + (let ((blah (frobnicate x)) + (blub (quiblify y))) + `(foo bar ,(generate blah blub)))) + +(#foo x y z) ;decoder error + +(#foo x y) ;proper use + +(a b #foo x y) ;also works, but don't do it please + +(#DEFINE #bar '(+ 1 2)) + +(import (zisp io)) ;imports the print function + +(print #bar) ;will print 3 + +(#SPLICE "b.zisp") +``` + +```scheme +;; ht.zisp + +(#DEFINE (#ht & entries) + (define ht (make-hash-table)) + (loop ((key value) entries) + (ht.set key value)) + ht) +``` + +```scheme +;; b.zisp + +(define (foobar) + (let ((data (#STRING "example.data"))) + (data.do-something))) + +(define cycle #%0=(1 2 3 & #%0%)) +``` + +If you find the use of uppercase to be ugly, consider that a feature, +because messing with the decoder this much would be discouraged. The +only example above that actually makes some sense is the one defining +hash table syntax. + +Actually, since I want to make it possible to serialize absolutely +anything in Zisp, a regular macro could also be used to construct +hash-table literals. See the [serialization](250210-serialize.html) +note on that. + +However, this causes such custom object literals to not stand out: + +```scheme +(define (foobar) + (let ((my-ht (ht (a 1) (b 2)))) ;doesn't look like a literal + (use my-ht here))) + +(define (foobar) + (let ((my-ht #ht((a 1) (b 2)))) ;more obvious that it's a literal + (use my-ht here))) +``` + +For this reason, it would be a convention that decoder rules are used +to implement new object literal syntax, and macros used for then you +want to output code, with hygienic bindings. + +```scheme +;; Can't do this with decoder rules + +(import (zisp base)) +(import (de tkammer my-module)) + +(define-syntax (my-macro x y ) + (let ((x (call something from my-module)) + (y (also bind this one to something)) + (foo (this is a new local identifier))) + (do-something-with foo) + ;; this could also contain a `foo` without clashing: + )) +``` + +Decoder rules would probably be equivalent in power to Common Lisp +macros, but it will only be possible to bind them to runes, not to +regular identifiers, so they will be demarcated very clearly. They +aren't intended for the same purposes as Common Lisp macros, so the +equal power is merely incidental. Use hygienic macros if you want +"real" Lisp macros; decoder rules are only for superficial syntax +enrichment, not meant to be intertwined with program logic. diff --git a/notes/index.md b/notes/index.md index dd5a946..5d02a60 100644 --- a/notes/index.md +++ b/notes/index.md @@ -23,3 +23,4 @@ * [Goals](250920-goals.html) * [A full-stack programming language](260102-full-stack.html) * [Simplifying S-Expression Grammar](260106-simpler-grammar.html) +* [Decoder](260107-decoder.html) -- cgit v1.2.3