Add a note.

author: Taylan Kammer <taylan.kammer@gmail.com> 2026-06-10 10:08:10 +0200
committer: Taylan Kammer <taylan.kammer@gmail.com> 2026-06-10 10:08:10 +0200
commit: 320c5aafff0499512475686feb374bdfde2a4123 (patch)
tree: 8b4d9ccdb78e106eac6479f7cbea4ca610fbcee7
parent: e4514285f2148d1fed1a0871494a2fda23a3e503 (diff)
2 files changed, 177 insertions, 0 deletions
diff --git a/notes/260610-fastcons3.md b/notes/260610-fastcons3.md
new file mode 100644
index 0000000..e1f7cfa
--- /dev/null
+++ b/notes/260610-fastcons3.md
@@ -0,0 +1,176 @@
+# Cons cell optimization again
+
+_2026 June_
+
+Last episode: [Cons cell optimization!](260608-fastcons2.html)
+
+Given the sub-optimal situation with these kinds of optimizations for
+manual uses of `cons` in program code, perhaps it's best to limit the
+scope of this optimization:
+
+* Regular `cons` creates normal pairs; `car` and `cdr` only work on
+  normal pairs.
+
+* Programmers can explicitly use the array-backed list data type if
+  they want to; it's simply another data structure available in the
+  standard library, alongside things like dynamic arrays that grow
+  upon saturation and move their entire contents.
+
+  Note: Those dynamic regrowing arrays are what are typically called
+  an ArrayList data structure, like in Zig or Java.  In this article
+  I'm using terms like array-backed list or array-list for my jumbo
+  cons cell concept instead.  I believe some call this SegmentedList.
+
+* We break from Lisp tradition and *don't* return regular pairs from
+  s-expression parsing; we return array-backed lists.  This is a sane
+  choice because:
+
+  1. A parser knows the length of any list it's allocating; why use a
+     pair-based linked list structure?
+
+  2. One of the primary users of the parser is the interpreter; it's
+     greatly beneficial for its performance to use contiguous arrays
+     for code forms to evaluate.
+
+  It just means that when you parse some data, you need to make sure
+  to use the array-list API rather than the cons cell API to traverse
+  the data you've parsed.  I think that's fine.
+
+Given that these array-lists don't pretend to be cons cells anymore,
+the whole "sentinel bit pattern at the start" thing isn't necessary
+anymore; programmers using this data type simply need to understand
+that it's a different data structure and they can't simply acquire a
+pointer to the "rest" of the list from any arbitrary point.
+
+We don't need a `cdr` equivalent for this data type at all; we offer
+other functions to traverse it, segment by segment.  You can't get a
+pointer to the middle of a segment; you can only get a pointer to the
+head of the next segment.
+
+So, the 3-bit or 4-bit tag on the pointer gets a slightly different
+meaning: It's simply the length of the current segment.  Well, it's
+actually the same exact meaning as before, and technically you could
+still construct a pointer to the middle of a segment with the length
+tag indicating the remaining length of that segment, but it's simply
+not going to be supported; the GC could collect the segment, leaving
+you with a pointer to invalid memory.
+
+Also, this means that any length other than the maximum implies that
+the list ends there; you'd never put a pointer to another segment to
+the end of an unsaturated segment, because why would you.
+
+Let's look at a few examples of such array-lists in memory.  We'll
+assume a 3-bit tag for simplicity; you can imagine what a 4-bit tag
+would do.
+
+Single-element proper list:
+
+    tag = 0
+
+    0       1       2       3       4       5       6       7
+
+    foo     nil
+
+(Earlier I wrote that the tag is "simply the length of the current
+segment" which was a lie: The length is tag + 2 and the last slot in
+the segment is nil or a pointer to another segment or the end of an
+improper list.)
+
+An improper list with 3 elements and non-nil tail:
+
+    tag = 2
+
+    0       1       2       3       4       5       6       7
+
+    foo     bar     baz     bat
+
+A proper list filing up a cache line perfectly:
+
+    tag = 6
+
+    0       1       2       3       4       5       6       7
+
+    foo     bar     baz     bat     qux     fux     tux     nil
+
+Tag value 7 means the list *might* be longer; this is a bit awkward
+because the tail sits on the next cache line:
+
+    tag = 7
+
+    0       1       2       3       4       5       6       7       8
+
+    a       b       c       d       e       f       g       h     ptr
+
+However, if the list actually is longer, the elements could follow
+immediately after that pointer, because hopefully the whole structure
+was allocated in one contiguous array.  These continuation pointers
+are, in a sense, an artifact resulting from the fact that we can't
+otherwise represent longer lists, since the length bits are scarce.
+
+This is fine, especially if the length bits are extended to 4, since
+the point is using this for the kinds of short lists that are very
+common in Lisp/Scheme code.  How often does source code contain an
+s-expression longer than 15 elements?  (That's tag value 14, with
+trailing nil to end it as a proper list; 16 slots total filling two
+cache lines perfectly.)
+
+## Why improper lists at all?
+
+I've asked myself whether it would make sense to ditch improper lists
+and make the terminating nil implicit for non-maximum length segments
+but I don't like that idea.
+
+Improper lists kinda have their place in Lisp/Scheme code, such as to
+represent rest arguments.  And besides, the semantics get quite wonky
+if the terminating nil is implicit on non-max length; you still need
+it on max length if the list actually does end after that, and then
+what do you do if it's not nil and also not a segment pointer?  Now
+you have improper lists again, but they can only have a non-tail
+element count equal to a multiple of 8 or 16, which is weird.
+
+Plus, there may actually be valid use-cases where you create segments
+of non-max length, and then put a pointer to another segment at the
+end anyway.  The parser has no need for this, but someone using this
+segment-array-linked-list structure for some special purpose could
+find a use for this, perhaps.
+
+## Closing up
+
+I think this will be a very fine optimization with respect to the
+operation of the interpreter.  I've lots of ideas in mind on how to
+make it almost like a bytecode VM despite taking tree data.  As an
+example, after the first time a list is evaluated as a function call,
+verifying that it's a proper list with an identifier referring to a
+function in the first position, it can be modified as follows:
+
+The head of the list is replaced with a special NaN-packed value of
+type "call instruction" that encodes a pointer to a function as a
+payload directly in the NaN.  This instruction implies that the list
+has been verified to be proper, so the length tag on the pointer that
+led us to this list can be relied on directly, not requiring another
+verification that the call expression (list) is well-formed (proper).
+
+Bytecode is inherently linear, whereas a tree structure inherently
+requires some pointer jumping.  Consider the expression:
+
+    (+ x (* y (/ z t)))
+
+As bytecode, this would be turned into instructions that first call
+the division operator, put the result in a register or the stack,
+continue with the multiplication, and so on.
+
+As a tree, even if you did some pre-verification and turned symbols
+into direct call instructions embedding a pointer and so on, you're
+still dependent on recursion to evaluate the innermost form first,
+store the result, walk back up a level, and so on.
+
+Other than by a complete transformation of the code, extracting inner
+expressions and putting their results into local variables, you can't
+make this as nice to execute as bytecode.  I wonder how much overhead
+that really causes though, if the interpreter is otherwise optimized
+well and rewrites s-expressions in-place to save future work, and the
+various array-lists the code is made up of all fit neatly into L1.
+
+You may be wondering why I don't just go the bytecode VM route, but
+I've already wandered off from the topic far enough, so maybe I'll
+write a separate note about that.
diff --git a/notes/index.md b/notes/index.md
index a4f753e..e331101 100644
--- a/notes/index.md
+++ b/notes/index.md
@@ -31,3 +31,4 @@
 * [Interpreter](260522-interpreter.html)
 * [Static analysis to optimize GC](260524-gc-opt.html)
 * [Cons cell optimization!](260608-fastcons2.html)
+* [Cons cell optimization again](260610-fastcons3.html)
author	Taylan Kammer <taylan.kammer@gmail.com>	2026-06-10 10:08:10 +0200
committer	Taylan Kammer <taylan.kammer@gmail.com>	2026-06-10 10:08:10 +0200
commit	320c5aafff0499512475686feb374bdfde2a4123 (patch)
tree	8b4d9ccdb78e106eac6479f7cbea4ca610fbcee7
parent	e4514285f2148d1fed1a0871494a2fda23a3e503 (diff)