summaryrefslogtreecommitdiff
path: root/notes/260107-decoder.md
blob: a1118b7bf8075e1881ebf836f40c87a00041faaa (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
# Decoder

_2026 January_

I've mulled over this quite a bit now, and I believe I've figured out
what kind of design I want for the "decoder" component.

To recap: Zisp has a "parser" that implements an extremely bare-bones
s-expression format (though with some interesting syntax sugar baked
in), with a lot of the features you would expect of a typical "reader"
being offloaded into a second pass over the data.

That second pass is done by the *decoder* and will handle, among other
things:

- Number literals (the parser only knows about strings)

- Boolean literals (the parser only knows about "runes")

- Literals for various compound objects like vectors

- Datum labels/references, for cyclic data

- Emitting direct references to macros like quote, unquote, and those
  implementing some of the more exotic syntax features like `foo.bar`
  for field and method access, `foo:bar` for type declarations, etc.

(To be clear, `foo.bar` actually becomes `(#DOT foo & bar)` at the
parse stage, and `foo:bar` becomes `(#COLON foo & bar)` and so on.
The decoder then substitutes `#DOT` and `#COLON` and the like for
references to macros that actually implement the feature.)

The decoder is also going to be extensible, to allow for something
similar to reader macros in Common Lisp, but closer to regular macros
because this extensibility will be based on runes: A list beginning
with a rune can invoke a decoder procedure for that rune, and these
can be user-defined.

I've previously agonized over whether this means that the decoder is
essentially the same thing as a macro expander, or rather, whether it
would make sense to merge the functionality of the two.  But I've come
to the conclusion that this would be wrong.

Key differences between the decoder and a macro expander include:

- The macro expander is fully aware of bindings and lexical scope;
  it's influenced by import statements, operates on syntax objects
  that carry scope context, and so on.  The decoder is completely
  oblivious to identifier bindings and doesn't understand scoping.
  For example, there's nothing like `let-syntax` for the decoder.

- The macro expander only calls a macro when the head of a list is an
  identifier bound to a syntax transformer.  The decoder walks through
  lists and checks for runes everywhere; otherwise the following would
  not work as expected:

  ```scheme
  ;; Alist with vectors as the values.
  ((x & #(a b))
   (y & #(c d)))
  ```

  The parser will turn the entries into `(x #HASH a b)` and the like,
  since `#(a b)` is sugar for `(#HASH a b)` and `(x & (#HASH a b))` is
  equivalent to `(x #HASH a b)`.  So, to make this work, the decoder
  checks every single pair in a list and invokes a transformer if the
  `car` of that pair is a rune bound to a decoder rule.

These differences not only mean that the implementation will be quite
different, but also that the decoder is conceptually a very different
thing.  No doubt there will be some similarity in their algorithms,
but the conceptual simplicity of the decoder (no notion of scope or
identifier bindings) means that you can reason about what it will do
to source files much more easily.

Macros in Scheme have a completely different "feel" to them.  They're
really part of the program logic.  The whole point of hygienic macros
is that they fit in seamlessly with the rest of your program, rather
than being a disjoint pre-processor operating outside program logic.
That's valuable in a different way.  (Zisp will also support hygienic
macros like Scheme.)

Although the decoder is not as smart as a macro expander, I still
intend to make it fairly powerful, supporting:

- `(#IMPORT ...)` to import additional decoder rules dynamically, so
  you could have something akin to a library of decoder extensions.
  Yes, I know: It's ironic to list the decoder's lack of awareness of
  imports as a key difference from the expander, and then make it
  support its own import mechanism.  But it's not the same.  Regular
  imports will be allowed within lexical scopes; decoder imports are
  top-level only.

- `(#DEFINE ...)` to dynamically add a decoder rule on the spot.
  Again, not like a regular define: Top-level only, and unaware of
  surrounding bindings.  The decoder procedures defined in this way
  will run in a pristine standard environment, though they can use
  regular imports within their body to call to external code.

- `(#STRING ...)` to embed the contents of a file as a string literal,
  similar to `@embedFile()` in Zig.

- `(#PARSE ...)` to parse a single expression from a file and put it
  into this position.  (Error if file contains more expressions.)

- `(#SPLICE ...)` to parse all expressions in a file and splice them
  into this position.  (Essentially, `#include` from C, but obviously
  not meant to be used like in C.)

These will be turned off by default, so a decoded file cannot run
arbitrary code, or maliciously embed `/dev/random`!  The standard
"Zisp code decoder" configuration used to read program and library
files will then enable these features.

Splicing could be used for the same effect as an import, but import
makes it explicit that no expressions are being inserted.  Files with
decoder rules could also be compiled into a binary, which the import
mechanism could locate and use, instead of parsing the source file
again every time.

Here's some imaginary Zisp source files demonstrating decoder use:

```scheme
;; a.zisp

(#IMPORT "ht.zisp")  ;may load compiled code of ht.zisp from a cache

(define my-hash-table #ht((a 1) (b 2)))  ;#ht imported from ht.zisp

(#DEFINE (#foo x y)
  (import (de tkammer my-helper-module))
  (let ((blah (frobnicate x))
        (blub (quiblify y)))
    `(foo bar ,(generate blah blub))))

(#foo x y z)   ;decoder error

(#foo x y)     ;proper use

(a b #foo x y) ;also works, but don't do it please

(#DEFINE #bar '(+ 1 2))

(import (zisp io))  ;imports the print function

(print #bar)   ;will print 3

(#SPLICE "b.zisp")
```

```scheme
;; ht.zisp

(#DEFINE (#ht & entries)
  (define ht (make-hash-table))
  (loop ((key value) entries)
    (ht.set key value))
  ht)
```

```scheme
;; b.zisp

(define (foobar)
  (let ((data (#STRING "example.data")))
    (data.do-something)))

(define cycle #%0=(1 2 3 & #%0%))
```

If you find the use of uppercase to be ugly, consider that a feature,
because messing with the decoder this much would be discouraged.  The
only example above that actually makes some sense is the one defining
hash table syntax.

Actually, since I want to make it possible to serialize absolutely
anything in Zisp, a regular macro could also be used to construct
hash-table literals.  See the [serialization](250210-serialize.html)
note on that.

However, this causes such custom object literals to not stand out:

```scheme
(define (foobar)
  (let ((my-ht (ht (a 1) (b 2))))  ;doesn't look like a literal
    (use my-ht here)))

(define (foobar)
  (let ((my-ht #ht((a 1) (b 2))))  ;more obvious that it's a literal
    (use my-ht here)))
```

For this reason, it would be a convention that decoder rules are used
to implement new object literal syntax, and macros used for then you
want to output code, with hygienic bindings.

```scheme
;; Can't do this with decoder rules

(import (zisp base))
(import (de tkammer my-module))

(define-syntax (my-macro x y <body>)
  (let ((x (call something from my-module))
        (y (also bind this one to something))
        (foo (this is a new local identifier)))
    (do-something-with foo)
    ;; this could also contain a `foo` without clashing:
    <body>))
```

Decoder rules would probably be equivalent in power to Common Lisp
macros, but it will only be possible to bind them to runes, not to
regular identifiers, so they will be demarcated very clearly.  They
aren't intended for the same purposes as Common Lisp macros, so the
equal power is merely incidental.  Use hygienic macros if you want
"real" Lisp macros; decoder rules are only for superficial syntax
enrichment, not meant to be intertwined with program logic.