notes/260811-fastcons4.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

# Cons cell optimization AGAIN

_2026 June_

Last episode: [Cons cell optimization again](260810-fastcons3.html)

I am completely straying from my original idea of: "Just make a naive,
simple interpreter, with bindings to libgccjit, and anyone who needs
performance can just ask for specific functions in their code (or all
of them) to be compiled to native code."

(See: [The interpreter and the compiler](260522-interpreter.html))

To be clear, I won't ditch that overall design idea.  However, as I
started asking myself how to at least make a *reasonably* efficient
interpreter through the use of some smart tricks, I fell into a big
rabbit hole, and found myself enthralled by the idea of an advanced
direct AST interpreter that tries to approach the performance of a
byte-code VM while sticking to a tree representation of code.

One of the many causes of inefficiency in AST walking is the need to
chase around pointers and, if using traditional s-expressions with
cons cells, the terrible data density.  (Citation needed?  I think
this is widely accepted.)

A cons cell is 50% wasted space, using half of its memory to point to
the next cell.  This is mostly alleviated by the array-segmented list
idea, but they still need a nil terminator for proper lists.

That nil terminator is a whopping 8 bytes.  If I also used 16-byte
alignment, I'd be wasting an average of 12 bytes per list!  That's
what, almost 1/5 of a cache line?  It's 18.75% to be exact.  Even
using 8-byte alignment, wasting only 8 bytes per list, it's 1/8 or
12.5% of a cache line.

Another way to look at it is total memory overhead.  To calculate it,
you'd need to know the exact average length of lists in code.  Let's
say it's 4 or something.  (That's probably quite generous?)  At that
length, every 1 out of 5 elements is just the nil terminator; a data
overhead of 20%!

Reducing the memory footprint of code by ~10-20% or so is a highly
attractive prospect.  And all we need to do is ditch improper lists.

The new strategy is this:

* A 3-bit tag on a list pointer indicates the length; either 1 to 7,
  or if the tag value is 0, greater than 7.

* For lists longer than 7 elements, with the tag value 0, the array
  continues until a sentinel bit pattern is reached.

A list is always an array; there are no linked segment chains.  There
is no zero length list, since tag 0 means greater than 7, and other
values mean that exact count of elements.

The vast majority of functions have only a handful of arguments; we
support up to 6 arguments without needing to rely on the special list
array format that is sentinel-terminated.

This means the data density is 100% for function call expressions up
to 6 arguments (7 elements with the function itself).  Perfect!

## But... improper lists :(

I honestly like improper lists.  I think it's so elegant how Scheme
uses them to denote rest arguments.  I even had this fancy idea of
using them for function calls where you pass on your own rest arg.

(See: [Stop the "cons" madness!](250210-cons.html))

But sometimes you have to make sacrifices!  Besides, I think the only
reason I like them is familiarity.  They are kinda wonky if you think
about it.  Common Lisp and Emacs Lisp don't use improper lists in any
part of their actual syntax, if I remember correctly.  They use some
special identifier to mark rest args, and I could do a similar thing.

Damn it: I spent so much time refining the documentation of the Zisp
parser, and now I'll have to make massive revisions again.  Serves me
well for obsessing on one component without yet having a clear bigger
picture in mind.