notes/260522-interpreter.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261

# The interpreter and the compiler

_2026 May_

Last December, I wrote the following in the context of how one might
[bootstrap](250329-boot.html) Zisp even if it uses a self-hosting
compiler:

* There will be a Zisp interpreter written in Zig, which is fairly
  simple and naive in its implementation and, for example, ignores
  static type declarations.  It should support the full Zisp language
  including hygienic macros, but be as easy as possible to maintain.

* The Zisp compiler will be written in Zisp.  The interpreter can run
  the compiler (since it can run any Zisp program) and will be used to
  compile the compiler.

After some pondering on a variety of topics, I've decided to stick
with this, just with one significant added insight:

The interpreter will not be some bootstrapping hack and then put in
the dustbin until someone needs to bootstrap from scratch again.
Rather, the interpreter will be a first-class citizen of the Zisp
implementation.

This is because a simple interpreter without any compilation overhead
is useful for an entire class of applications: Small to medium size
scripts that you simply plop into `~/bin` with a shebang line at the
top, or other similarly small programs that are simply distributed as
monolithic source files, or at most a small collection of files.

The interpreter may be slow, but these would be the kinds of programs
one might otherwise write in GNU Bash or the like (which is also quite
slow) except GNU Bash doesn't even have proper data structures, so it
becomes a terrible choice very quickly.  The next consideration after
Bash would typically be a language like Python, and although even the
CPython interpreter might beat the naive Zisp interpreter (because the
former at least uses bytecode and had a ton of engineering poured into
it) this shouldn't really matter, since the kind of tiny application
we're talking about typically wouldn't involve heavy computation.

(Besides, a Zisp script could choose to compile parts of itself; more
on this later.)

Another example are build scripts.  One of the first ideas I had when
pondering on Zisp's design is how [compilation](250210-compile.html)
should automatically evaluate the top-level of a program, simply
because this feels most natural to me.  Furthermore, I've pondered
about how it should be possible to [serialize](250210-serialize.html)
everything in the language, so compiling a program would be a matter
of calling something like `(write main)` after the main function is
defined.  Both of these fit naturally with the idea that a build
script for a Zisp program would essentially just be a Zisp script
which imports all the files in the codebase, compiles everything, and
writes out the result.  Such a build script would be interpreted, with
the compiler being a shared library it loads.

The compiler itself would typically still be shipped in compiled form,
as well as the rest of the standard library, though it's conceivable
that there might be benefits to having stdlib sources available; the
compiler may be able to do better whole-program analysis, achieving
better results than what you might get from LTO.

## The programmer is in control of compilation

Shipping an interpreter, with a compiler as a library, being able to
compile things on-the-fly as instructed by the interpreted source
itself, enables some novel strategies in development and deployment.

### Manual JIT

First, imagine you started developing a program as a fairly small
script but at some point begin to realize that it does, after all,
involve some heavy computations that could benefit from improved
performance.

Maybe it takes 10-20 minutes to run, with the majority of that time
spent on one or two functions sifting through massive amounts of data
and doing some heavy computation, involving some tight loops.  Well,
your interpreter includes a compiler, so what about you simply just
call the compiler on those functions right after defining them?

Note that we're not talking about compiling *files* but simply some
functions that are sitting in memory as AST and would otherwise be
interpreted naively and slowly.

It's said that the difference between a naive AST interpreter, and
compiled native code, can be as high as a 5-20x difference, so your
script running in 20 minutes could be reduced down to 1-2 minutes; a
little extra computation is added up-front to compile a function or
two, then they run blazing fast.

### Native targeting, and user data/code specialization

The fact that you have a compiler in your runtime, and that it has a
well-designed easy to use API, opens the door to a somewhat unusual
software deployment strategy:

Despite the fact that your application is rather sophisticated and
needs to run at peak performance, you distribute it as source code,
with a "boot" process that compiles all the sources every time when
it's started up on the end user's machine.  (Well, the compilation
result could be cached into files on disk too, but that's a detail.)

This has two advantages.  For one, the code is always compiled for the
exact native architecture, not just an ISA family.  This can improve
performance a little, sometimes.

Secondly, and more interestingly, data *and even code* read from a
configuration file can be compiled straight into the native code
that's being generated.

If you know Nginx's configuration format, you may know that it has
some limitations that appear a bit strange, typically because the
directives need to be "compiled" into something efficient if they
declare some logic that has to be executed on every single request.
Since Nginx doesn't want to implement a sophisticated compiled DSL
like Varnish, it ends up being somewhat limited.  Varnish does make
that jump and implements a whole DSL for per-request decisions, which
is transpiled to C, compiled into a dynamic lib and loaded.

Imagine Nginx was written in Zisp, and distributed in source format.
You could have arbitrary code in your configuration, for per-request
decisions, which would be compiled into native code and potentially
inlined straight into Nginx's request handler.  Imagine Varnish was
written in Zisp.  It wouldn't need to invent a whole new language!

(I just realized Varnish has been renamed to Vinyl Cache, but I
suspect most people still know it as Varnish, like me just now.)

Just as an aside, I think this "compile at startup and cache it"
strategy is used by Elixir.  Or maybe I just got that impression
because I've installed Pleroma (an Elixir application) from Git.
Either way, I doubt my idea is entirely new; this is definitely a
strategy that can already be used by any application written in a
language with a compiler built into the runtime, like many Lisp or
Scheme implementations.

## Why not automatic JIT?

Although a more "proper" JIT has some advantages, like being able to
specialize on arbitrary run-time data (not just config files or other
such "boot-time" data), they typically produce significantly worse
code than a "full AOT compiler in a JIT-shaped trench coat" because
the AOT compiler simply spends a *lot* more time on analysis upfront.
Don't cite me on this, but it appears to be the current consensus.

Traditional JIT, as opposed to what LLVM and GCC offer (i.e., AOT in a
JIT shaped trench coat), needs to be low latency, since it's done on
the fly, transparently, and concurrently.  Imagine your browser ran
GCC or LLVM for every JS file it received.  That would be ridiculous.
Note that JS is special in that it's basically the only programming
language where arbitrary new code is loaded *all the time* during the
normal course of operations.  Other languages just don't need this.
It's just JS where high upfront latency is unacceptable.

Why do Java, Lua, and a bunch of other dynamic languages use JIT?
Partly, it may be cultural: Native AOT compilation feels yucky,
invoking associations such as long compile times multiplied by the
number of target architectures, needing to ship binary blobs, and the
primitive C ABI.  Java can have its own rich ABI, and languages like
Lua don't have an ABI at all because everything is source code.  If
programmers can simply ship source files, or at worst cross-platform
byte code like for the JVM, and then the JIT magically makes things
faster, there's less headache I guess.  (There is AOT for Java, but
it's a niche.)

Another reason, probably, is that many high-level languages are very
dynamic and lack a serious static type system that would be needed to
generate peak performance AOT compiled code.

Zisp is all about breaking norms, and giving the programmer maximum
freedom.  The interpreter might one day incorporate some lightweight
JIT, but my aim is to ensure that a Zisp programmer always has the
ability to generate peak-performance native compiled binaries, through
a combination of features such as: An optional but serious static type
system, the ability to completely take control over memory management
rather than relying on GC, and integrating with a high-end AOT native
compiler like GCC.

Tall claims, I know.  Stop looking at me like that.  Yes I know, all I
have so far is a fucking s-expression parser, a NaN packing strategy
for dynamic typing, and dreams.  But if I keep dreaming and planning,
I'm sure the implementation will spontaneously pop into existence any
day now.

## Summary of planned implementation architecture

Just to recap, here's the plan so far:

1. A code base in a low level language (probably Zig but not married
   to it) implements the Zisp core, meaning interpreter, basic data
   types, and a slim standard library.  Comparable to R7RS-small in
   complexity, give or take.  The interpreter accepts but ignores
   advanced code constructs intended to help the compiler, such as
   declarations and directives related to static typing and explicit
   object lifetime management.  (Simple bindings to libgccjit are
   exposed; libgccjit.so is an optional run-time dependency.)  This
   yields libzisp.so and the zisp executable, which are like liblua
   and the lua executable.  You *can* use just this if you need a
   minimal Zisp interpreter with a barebones stdlib; OS package
   repositories could deploy these in a "zisp-core" package.

2. Richer standard library routines are written in Zisp, but the
   sources are meant to stay in the source code repo; wait for it.

3. An advanced compiler, which actually understands the constructs
   mentioned in point 1, is written in Zisp.  The compiler infers
   static types where possible, and applies strategies to decrease GC
   pressure, such as escape analysis, even if compiled code offers no
   helpful declarations at all.  But with full static typing and
   manual memory management, Zisp can practically be used as if it's
   yet another low-level language front-end for GCC; it's up to the
   programmer how much effort they want to put into improving the
   performance of their code.  The compiler implementation may use
   parts of the richer standard library mentioned above, which is not
   yet compiled, mind you.

4. The interpreter runs the compiler to compile the compiler; this
   yields libzispcomp.so which Zisp can load dynamically so when
   deploying Zisp you don't need to compile the compiler on every
   end-user machine.  (Zisp can load any .so dynamically really.)
   Standard library routines written in Zisp are imported directly
   from within the source code repo at this point, and are merely
   interpreted, since the compiler itself wasn't ready yet.
   (Actually, you could run the compiler with the interpreter to
   compile the stdlib first, then use the compiled stdlib while
   compiling the compiler.  But this would probably be slower.)

5. The richer standard library routines are finally compiled, giving
   us libzisputil.so, which contains goodies that interpreted Zisp
   code can also load and use, so Zisp scripts aren't limited to the
   barebones stdlib anymore.

In OS package repositories, you'd have zisp-core which only contains
libzisp.so and the zisp executable, and then you'd have the standard
zisp package which also pulls in libzispcomp and libzisputil as two
additional packages.

Actually, libzispcomp itself would probably depend on libzisputil
anyway, but if you're an absolute nerd you *could* manually install
only zisp-core and libzisputil, giving you an interpreter and rich
standard library, without a compiler.  This would allow you to omit
libgccjit as well, which could be useful if you want to use the Zisp
interpreter for simple scripts on some minimal systems.

## Closing up

Funny, I had totally forgotten about this note:

- [Using libgccjit?](250920-libgccjit.html)

Yes, I will most definitely be using libgccjit.  If Zisp is to be a
true [full-stack language](260102-full-stack.html) then it must be
able to produce code rivaling C in efficiency, and that requires
either GCC or LLVM.

Some of the other considerations in the above linked note, like the
"ZispScript" idea, are obsolete.  Unless I've totally goofed up and
planned some illogical nonsense above, I'll be going with what I've
written here, not in the previous note.