summaryrefslogtreecommitdiff
path: root/notes/260102-full-stack.md
blob: 3870bf5126f0ffdfdbce56c2808a3f27b2144f0e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# A full-stack programming language

_2026 January_

As I become more and more ambitious with my dreams about what I want
Zisp to become, it becomes less and less likely that I'll actually
create and finish the language, ever.  But I just can't help it.

The notion of a "full-stack programming language" was first widely
introduced, from what I know, by the Red language project.  To quote
their website:

> Red’s ambitious goal is to build the world’s first full-stack
> language, a language you can use from system programming tasks, up
> to high-level scripting through DSL. You've probably heard of the
> term "Full-Stack Developer". But what is a full-stack Language,
> exactly?
>
> Other languages talk about having "one tool to rule them all". Red
> has that mindset too, pushed to the limit - it's a single executable
> that takes in your source files on any platform, and produces a
> packaged binary for any platform, from any other. The tool doesn’t
> depend on anything besides what came with your OS...shipping as a
> single executable that about a megabyte. [sic]
>
> But that technical feat alone isn't enough to define Red's notion of
> a "Full-Stack Language". It's about the ability to bend and redefine
> the system to meet any need, while still working with literate code,
> and getting top-flight performance.  So what's being put in your
> hands is more like a "language construction set" than simply "a
> language". Whether you’re writing a device driver, a platform-native
> GUI application, or a shared library... Red lets you use a common
> syntax to code at the right level of abstraction for the task.

Source: [Red: About](https://www.red-lang.org/p/about.html)

This is exactly what I dream of, although keeping the entire thing to
about a megabyte in size seems unrealistic.  (I'm thinking of using
libgccjit for native compilation, which already has an installed size
of tens of megabytes, though statically linking a reduced portion may
bring it down a bunch.)

In fact, I may have independently come up with the idea of a language
implementation being almost more of a "programming language toolbox."
Or maybe I subconsciously stole the idea from Red because I read about
their "language construction set" idea years ago; who really knows.
In any case, it's a very exciting idea.  Such a toolbox would allow
you to create code-bases using specialized dialects of the language,
while still using the base machinery provided by the language.  This
stands in contrast to a language being "opinionated" in how you should
write your code and having its own mind on how to actually do things
at run-time, which tends to be the case with higher level languages.

It would mean that the language is not for the faint of heart.  Though
I want it to be possible to write high-level Zisp code without having
to think of any of the more complex mechanisms, offering an experience
comparable to writing Python or JavaScript -- or rather, Scheme -- the
real power of the language would only show itself to those who are
familiar with low-level concepts and typical implementation machinery
of languages.  A seasoned developer could take control of the behavior
of the garbage collector to fine-tune it for the allocation patterns
of their application, or even write modules that entirely avoid GC,
opting for less automatic memory management strategies such as pools
or totally manual alloc/free.  They could have Zisp produce highly
optimized machine code (while remaining cross-platform) by providing
all the static type declarations necessary to eliminate run-time
overhead, and use a low-level record type system that allows defining
the exact in-memory layout of data records so as to make the best use
of CPU cache lines and whatnot.  A Zisp code-base could be as simple
as a beginner-level Python code-base, or as complex as an advanced C
code-base.

In some sense, Emacs has a similar philosophy, but applied to making a
text editor.  At its core, it's a lisp machine, containing a bunch of
primitives that are conductive to creating a text editor; and then the
text editor is implemented on top of that and has a flexibility you
won't find in many other editors.  Zisp would be something akin to a
set of tools useful to creating a programming language, and then kind
of a "default" language constructed with those tools.  It's like the
language is telling you: "You can either give me a very simple high
level description of what you want to happen, and I'll make it happen
somehow; or you can tell me every little detail of how you want me to
do it."

Red is described as a homoiconic language, like Lisps.  I'm not sure
why they didn't just go with some kind of s-expression syntax.  They
don't use a syntax that's any more familiar to the average programmer;
it seems like they have their own unique thing.  If I went through the
documentation of Red, I'm sure I would find many other reasons why I
want to create my own thing instead of simply joining their project.
Could be a great source of inspiration though; I'll have to take a
closer look one day.

To finish off this note, I want to provide an example of how a little
Zisp snippet could look when written in the high-level "don't care"
style, and then transformed into a lower-level style to take control
of more details of its run-time behavior.

This is an imaginary script that walks through a directory and creates
HTML files from Markdown files.

Note that this is essentially pseudo-code.  The Zisp parser that
exists as of the time I'm writing this should be able to parse the
snippets, but everything beyond that is fantasy and not necessarily
representative of what actual Zisp code will eventually read like.

```scheme
(import (zisp base))            ;import Zisp base language & stdlib

(link (zisp regex))             ;dynamically link regex library
(link (de tkammer markdown))    ;dynamically link a Markdown library

(define (md2html from to)
  (print "Converting {from} to {to} ...")
  (with ((in (file.reader from))
         (out (file.writer to)))
    ;; Get title from first line of Markdown
    (define title (regex.replace "^# " "" (in.first-line)))
    (define head-template (file.read "head.html"))
    (define head (head-template.replace "__TITLE__" title))
    (out.write head)
    (out.write "<body>")
    (out.write (format "<h1>{title}</h1>"))
    (markdown.stream in out)
    (out.write "</body>")))

(define (main)
  (md2html "index.md" "index.html")
  (loop (mdfile (glob "notes/*.md"))
    (define name (mdfile.basename))
    (md2html mdfile (format "notes/{name}.html"))))
```

A simple script like this may not realistically require optimization,
but we will do it for the sake of the example.  Let's identify sources
of overhead and unnecessary duplicate operations:

1. The regex library is dynamically linked, so the regex `"^# "` will
   need to be re-compiled on every loop.

2. The file `head.html` is read on every iteration.

3. Strings like `title` and `head` are allocated at every iteration
   and would eventually need to be collected by the GC.

You may think there's a lot more than that, such as a bunch of type
checks at run-time, since there are no explicit type declarations.
However, given the static type information available for the standard
library, and some basic type inference, this program should already be
mostly free of type checking overhead.  For example, `file.reader` is
known to return a reader, so `in.first-line` is known to return a
string, and so on.  (In fact, assuming that the dynamically linked
libraries also provide sufficient type information at compile time,
this program could be entirely statically type-checked, if I'm not
mistaken.)

Some of the optimizations we will perform are not tied to any Zisp
specialty; any sane language would, for example, allow you to read
`head.html` once and save the string for re-use.  Nevertheless, the
transformation of the program will provide some examples of what I
want Zisp to be capable of.

Here's the transformed code:

```scheme
(import (zisp base))            ;import Zisp base language & stdlib
(import (zisp regex))           ;import regex library (compile-time)

(link (de tkammer markdown))    ;dynamically link a Markdown library

(define title-regex (regex.compile "^# "))
(define head-template (file.read "head.html"))

(define title-limit 80)

(define title:buffer)
(define head:buffer)

(define (md2html from:string to:string)
  (print "Converting {from} to {to} ...")
  (with ((in (file.reader from))
         (out (file.writer to)))
    ;; Get title from first line of Markdown
    (title.reset)
    (unless (title.read-line in)
      ;; read-line returns false if the buffer fills up before the
      ;; line ends, so skip the rest of the first line
      (ignore (in.first-line)))
    (title-regex.replace "" title)
    (head.reset)
    (head.read-string head-template)
    (head.replace "__TITLE__" title)
    (out.write head)
    (out.write "<body>")
    (out.write (format "<h1>{title}</h1>"))
    (markdown.stream in out)
    (out.write "</body>")))

(define (main)
  (set! title (buffer.make title-limit))
  (set! head (buffer.make (+ head-template.length title-limit)))
  (md2html "index.md" "index.html")
  (loop (mdfile (glob "notes/*.md"))
    (define name (mdfile.basename))
    ;; format still allocates but I'm too lazy to change that too
    (md2html mdfile (format "notes/{name}.html"))))
```

To be honest, there's nothing too crazy going on here.  The strategy
of using statically sized buffers that are allocated once globally
could be implemented even in standard Scheme, using the bytevectors
library.  Indeed, Scheme is already quite good at enabling you to
write fairly efficient code, as far as dynamically typed, garbage
collected languages are concerned.

Nevertheless, we see for instance that the globals `title` and `head`
are defined with static type information, ensuring that no run-time
checks need to be added to their usage later.

(The addition of type information to the arguments of `md2html` is
rather stylistic, since they could be inferred via bidirectional type
inference anyway.)

One surprising difference from Scheme would be that the values of the
top level definitions such as `title-regex` would actually be created
at compile time, and serialized into the resulting binary.  See the
note ["Compilation is execution"](250210-compile.html) on that.

As I'm writing this, I notice that escape analysis is crucial for the
automatic elimination of some heap allocations.  In this case, that
would apply to variables like `mdfile` and `name`, for example.

All in all, I guess I've ended up demonstrating nothing more here than
the possibility of adding some static types to a program written in a
Scheme-like language to eliminate some dynamic type checks it would
otherwise need to have.  Big deal.  I really hoped that this little
snippet would be conductive to demonstrating a bunch of other Zisp
features enabling "low-level programming" (imaginary Zisp features,
that is) but I guess I should have come up with a better example.

In fairness to me, it's easy to underestimate just how efficient a
Scheme-like language (or indeed a Scheme implementation) could already
be, if the compiler implemented all kinds of optimizations like type
inference and escape analysis across function boundaries, and if the
programmer writes allocation-avoiding code.

It would be interesting to rewrite the above snippet in a style that
uses memory pools instead of avoiding dynamic allocation.  That could
lift the static limits while still eliminating GC use.  This shall be
left as a later exercise for myself.