summaryrefslogtreecommitdiff
path: root/notes/260522-interpreter.md
diff options
context:
space:
mode:
Diffstat (limited to 'notes/260522-interpreter.md')
-rw-r--r--notes/260522-interpreter.md261
1 files changed, 261 insertions, 0 deletions
diff --git a/notes/260522-interpreter.md b/notes/260522-interpreter.md
new file mode 100644
index 0000000..cf87180
--- /dev/null
+++ b/notes/260522-interpreter.md
@@ -0,0 +1,261 @@
+# The interpreter and the compiler
+
+_2026 May_
+
+Last December, I wrote the following in the context of how one might
+[bootstrap](250329-boot.html) Zisp even if it uses a self-hosting
+compiler:
+
+* There will be a Zisp interpreter written in Zig, which is fairly
+ simple and naive in its implementation and, for example, ignores
+ static type declarations. It should support the full Zisp language
+ including hygienic macros, but be as easy as possible to maintain.
+
+* The Zisp compiler will be written in Zisp. The interpreter can run
+ the compiler (since it can run any Zisp program) and will be used to
+ compile the compiler.
+
+After some pondering on a variety of topics, I've decided to stick
+with this, just with one significant added insight:
+
+The interpreter will not be some bootstrapping hack and then put in
+the dustbin until someone needs to bootstrap from scratch again.
+Rather, the interpreter will be a first-class citizen of the Zisp
+implementation.
+
+This is because a simple interpreter without any compilation overhead
+is useful for an entire class of applications: Small to medium size
+scripts that you simply plop into `~/bin` with a shebang line at the
+top, or other similarly small programs that are simply distributed as
+monolithic source files, or at most a small collection of files.
+
+The interpreter may be slow, but these would be the kinds of programs
+one might otherwise write in GNU Bash or the like (which is also quite
+slow) except GNU Bash doesn't even have proper data structures, so it
+becomes a terrible choice very quickly. The next consideration after
+Bash would typically be a language like Python, and although even the
+CPython interpreter might beat the naive Zisp interpreter (because the
+former at least uses bytecode and had a ton of engineering poured into
+it) this shouldn't really matter, since the kind of tiny application
+we're talking about typically wouldn't involve heavy computation.
+
+(Besides, a Zisp script could choose to compile parts of itself; more
+on this later.)
+
+Another example are build scripts. One of the first ideas I had when
+pondering on Zisp's design is how [compilation](250210-compile.html)
+should automatically evaluate the top-level of a program, simply
+because this feels most natural to me. Furthermore, I've pondered
+about how it should be possible to [serialize](250210-serialize.html)
+everything in the language, so compiling a program would be a matter
+of calling something like `(write main)` after the main function is
+defined. Both of these fit naturally with the idea that a build
+script for a Zisp program would essentially just be a Zisp script
+which imports all the files in the codebase, compiles everything, and
+writes out the result. Such a build script would be interpreted, with
+the compiler being a shared library it loads.
+
+The compiler itself would typically still be shipped in compiled form,
+as well as the rest of the standard library, though it's conceivable
+that there might be benefits to having stdlib sources available; the
+compiler may be able to do better whole-program analysis, achieving
+better results than what you might get from LTO.
+
+## The programmer is in control of compilation
+
+Shipping an interpreter, with a compiler as a library, being able to
+compile things on-the-fly as instructed by the interpreted source
+itself, enables some novel strategies in development and deployment.
+
+### Manual JIT
+
+First, imagine you started developing a program as a fairly small
+script but at some point begin to realize that it does, after all,
+involve some heavy computations that could benefit from improved
+performance.
+
+Maybe it takes 10-20 minutes to run, with the majority of that time
+spent on one or two functions sifting through massive amounts of data
+and doing some heavy computation, involving some tight loops. Well,
+your interpreter includes a compiler, so what about you simply just
+call the compiler on those functions right after defining them?
+
+Note that we're not talking about compiling *files* but simply some
+functions that are sitting in memory as AST and would otherwise be
+interpreted naively and slowly.
+
+It's said that the difference between a naive AST interpreter, and
+compiled native code, can be as high as a 5-20x difference, so your
+script running in 20 minutes could be reduced down to 1-2 minutes; a
+little extra computation is added up-front to compile a function or
+two, then they run blazing fast.
+
+### Native targeting, and user data/code specialization
+
+The fact that you have a compiler in your runtime, and that it has a
+well-designed easy to use API, opens the door to a somewhat unusual
+software deployment strategy:
+
+Despite the fact that your application is rather sophisticated and
+needs to run at peak performance, you distribute it as source code,
+with a "boot" process that compiles all the sources every time when
+it's started up on the end user's machine. (Well, the compilation
+result could be cached into files on disk too, but that's a detail.)
+
+This has two advantages. For one, the code is always compiled for the
+exact native architecture, not just an ISA family. This can improve
+performance a little, sometimes.
+
+Secondly, and more interestingly, data *and even code* read from a
+configuration file can be compiled straight into the native code
+that's being generated.
+
+If you know Nginx's configuration format, you may know that it has
+some limitations that appear a bit strange, typically because the
+directives need to be "compiled" into something efficient if they
+declare some logic that has to be executed on every single request.
+Since Nginx doesn't want to implement a sophisticated compiled DSL
+like Varnish, it ends up being somewhat limited. Varnish does make
+that jump and implements a whole DSL for per-request decisions, which
+is transpiled to C, compiled into a dynamic lib and loaded.
+
+Imagine Nginx was written in Zisp, and distributed in source format.
+You could have arbitrary code in your configuration, for per-request
+decisions, which would be compiled into native code and potentially
+inlined straight into Nginx's request handler. Imagine Varnish was
+written in Zisp. It wouldn't need to invent a whole new language!
+
+(I just realized Varnish has been renamed to Vinyl Cache, but I
+suspect most people still know it as Varnish, like me just now.)
+
+Just as an aside, I think this "compile at startup and cache it"
+strategy is used by Elixir. Or maybe I just got that impression
+because I've installed Pleroma (an Elixir application) from Git.
+Either way, I doubt my idea is entirely new; this is definitely a
+strategy that can already be used by any application written in a
+language with a compiler built into the runtime, like many Lisp or
+Scheme implementations.
+
+## Why not automatic JIT?
+
+Although a more "proper" JIT has some advantages, like being able to
+specialize on arbitrary run-time data (not just config files or other
+such "boot-time" data), they typically produce significantly worse
+code than a "full AOT compiler in a JIT-shaped trench coat" because
+the AOT compiler simply spends a *lot* more time on analysis upfront.
+Don't cite me on this, but it appears to be the current consensus.
+
+Traditional JIT, as opposed to what LLVM and GCC offer (i.e., AOT in a
+JIT shaped trench coat), needs to be low latency, since it's done on
+the fly, transparently, and concurrently. Imagine your browser ran
+GCC or LLVM for every JS file it received. That would be ridiculous.
+Note that JS is special in that it's basically the only programming
+language where arbitrary new code is loaded *all the time* during the
+normal course of operations. Other languages just don't need this.
+It's just JS where high upfront latency is unacceptable.
+
+Why do Java, Lua, and a bunch of other dynamic languages use JIT?
+Partly, it may be cultural: Native AOT compilation feels yucky,
+invoking associations such as long compile times multiplied by the
+number of target architectures, needing to ship binary blobs, and the
+primitive C ABI. Java can have its own rich ABI, and languages like
+Lua don't have an ABI at all because everything is source code. If
+programmers can simply ship source files, or at worst cross-platform
+byte code like for the JVM, and then the JIT magically makes things
+faster, there's less headache I guess. (There is AOT for Java, but
+it's a niche.)
+
+Another reason, probably, is that many high-level languages are very
+dynamic and lack a serious static type system that would be needed to
+generate peak performance AOT compiled code.
+
+Zisp is all about breaking norms, and giving the programmer maximum
+freedom. The interpreter might one day incorporate some lightweight
+JIT, but my aim is to ensure that a Zisp programmer always has the
+ability to generate peak-performance native compiled binaries, through
+a combination of features such as: An optional but serious static type
+system, the ability to completely take control over memory management
+rather than relying on GC, and integrating with a high-end AOT native
+compiler like GCC.
+
+Tall claims, I know. Stop looking at me like that. Yes I know, all I
+have so far is a fucking s-expression parser, a NaN packing strategy
+for dynamic typing, and dreams. But if I keep dreaming and planning,
+I'm sure the implementation will spontaneously pop into existence any
+day now.
+
+## Summary of planned implementation architecture
+
+Just to recap, here's the plan so far:
+
+1. A code base in a low level language (probably Zig but not married
+ to it) implements the Zisp core, meaning interpreter, basic data
+ types, and a slim standard library. Comparable to R7RS-small in
+ complexity, give or take. The interpreter accepts but ignores
+ advanced code constructs intended to help the compiler, such as
+ declarations and directives related to static typing and explicit
+ object lifetime management. (Simple bindings to libgccjit are
+ exposed; libgccjit.so is an optional run-time dependency.) This
+ yields libzisp.so and the zisp executable, which are like liblua
+ and the lua executable. You *can* use just this if you need a
+ minimal Zisp interpreter with a barebones stdlib; OS package
+ repositories could deploy these in a "zisp-core" package.
+
+2. Richer standard library routines are written in Zisp, but the
+ sources are meant to stay in the source code repo; wait for it.
+
+3. An advanced compiler, which actually understands the constructs
+ mentioned in point 1, is written in Zisp. The compiler infers
+ static types where possible, and applies strategies to decrease GC
+ pressure, such as escape analysis, even if compiled code offers no
+ helpful declarations at all. But with full static typing and
+ manual memory management, Zisp can practically be used as if it's
+ yet another low-level language front-end for GCC; it's up to the
+ programmer how much effort they want to put into improving the
+ performance of their code. The compiler implementation may use
+ parts of the richer standard library mentioned above, which is not
+ yet compiled, mind you.
+
+4. The interpreter runs the compiler to compile the compiler; this
+ yields libzispcomp.so which Zisp can load dynamically so when
+ deploying Zisp you don't need to compile the compiler on every
+ end-user machine. (Zisp can load any .so dynamically really.)
+ Standard library routines written in Zisp are imported directly
+ from within the source code repo at this point, and are merely
+ interpreted, since the compiler itself wasn't ready yet.
+ (Actually, you could run the compiler with the interpreter to
+ compile the stdlib first, then use the compiled stdlib while
+ compiling the compiler. But this would probably be slower.)
+
+5. The richer standard library routines are finally compiled, giving
+ us libzisputil.so, which contains goodies that interpreted Zisp
+ code can also load and use, so Zisp scripts aren't limited to the
+ barebones stdlib anymore.
+
+In OS package repositories, you'd have zisp-core which only contains
+libzisp.so and the zisp executable, and then you'd have the standard
+zisp package which also pulls in libzispcomp and libzisputil as two
+additional packages.
+
+Actually, libzispcomp itself would probably depend on libzisputil
+anyway, but if you're an absolute nerd you *could* manually install
+only zisp-core and libzisputil, giving you an interpreter and rich
+standard library, without a compiler. This would allow you to omit
+libgccjit as well, which could be useful if you want to use the Zisp
+interpreter for simple scripts on some minimal systems.
+
+## Closing up
+
+Funny, I had totally forgotten about this note:
+
+- [Using libgccjit?](250920-libgccjit.html)
+
+Yes, I will most definitely be using libgccjit. If Zisp is to be a
+true [full-stack language](260102-full-stack.html) then it must be
+able to produce code rivaling C in efficiency, and that requires
+either GCC or LLVM.
+
+Some of the other considerations in the above linked note, like the
+"ZispScript" idea, are obsolete. Unless I've totally goofed up and
+planned some illogical nonsense above, I'll be going with what I've
+written here, not in the previous note.