diff options
| author | Taylan Kammer <taylan.kammer@gmail.com> | 2026-06-10 10:08:10 +0200 |
|---|---|---|
| committer | Taylan Kammer <taylan.kammer@gmail.com> | 2026-06-10 10:08:10 +0200 |
| commit | 320c5aafff0499512475686feb374bdfde2a4123 (patch) | |
| tree | 8b4d9ccdb78e106eac6479f7cbea4ca610fbcee7 | |
| parent | e4514285f2148d1fed1a0871494a2fda23a3e503 (diff) | |
Add a note.
| -rw-r--r-- | notes/260610-fastcons3.md | 176 | ||||
| -rw-r--r-- | notes/index.md | 1 |
2 files changed, 177 insertions, 0 deletions
diff --git a/notes/260610-fastcons3.md b/notes/260610-fastcons3.md new file mode 100644 index 0000000..e1f7cfa --- /dev/null +++ b/notes/260610-fastcons3.md @@ -0,0 +1,176 @@ +# Cons cell optimization again + +_2026 June_ + +Last episode: [Cons cell optimization!](260608-fastcons2.html) + +Given the sub-optimal situation with these kinds of optimizations for +manual uses of `cons` in program code, perhaps it's best to limit the +scope of this optimization: + +* Regular `cons` creates normal pairs; `car` and `cdr` only work on + normal pairs. + +* Programmers can explicitly use the array-backed list data type if + they want to; it's simply another data structure available in the + standard library, alongside things like dynamic arrays that grow + upon saturation and move their entire contents. + + Note: Those dynamic regrowing arrays are what are typically called + an ArrayList data structure, like in Zig or Java. In this article + I'm using terms like array-backed list or array-list for my jumbo + cons cell concept instead. I believe some call this SegmentedList. + +* We break from Lisp tradition and *don't* return regular pairs from + s-expression parsing; we return array-backed lists. This is a sane + choice because: + + 1. A parser knows the length of any list it's allocating; why use a + pair-based linked list structure? + + 2. One of the primary users of the parser is the interpreter; it's + greatly beneficial for its performance to use contiguous arrays + for code forms to evaluate. + + It just means that when you parse some data, you need to make sure + to use the array-list API rather than the cons cell API to traverse + the data you've parsed. I think that's fine. + +Given that these array-lists don't pretend to be cons cells anymore, +the whole "sentinel bit pattern at the start" thing isn't necessary +anymore; programmers using this data type simply need to understand +that it's a different data structure and they can't simply acquire a +pointer to the "rest" of the list from any arbitrary point. + +We don't need a `cdr` equivalent for this data type at all; we offer +other functions to traverse it, segment by segment. You can't get a +pointer to the middle of a segment; you can only get a pointer to the +head of the next segment. + +So, the 3-bit or 4-bit tag on the pointer gets a slightly different +meaning: It's simply the length of the current segment. Well, it's +actually the same exact meaning as before, and technically you could +still construct a pointer to the middle of a segment with the length +tag indicating the remaining length of that segment, but it's simply +not going to be supported; the GC could collect the segment, leaving +you with a pointer to invalid memory. + +Also, this means that any length other than the maximum implies that +the list ends there; you'd never put a pointer to another segment to +the end of an unsaturated segment, because why would you. + +Let's look at a few examples of such array-lists in memory. We'll +assume a 3-bit tag for simplicity; you can imagine what a 4-bit tag +would do. + +Single-element proper list: + + tag = 0 + + 0 1 2 3 4 5 6 7 + + foo nil + +(Earlier I wrote that the tag is "simply the length of the current +segment" which was a lie: The length is tag + 2 and the last slot in +the segment is nil or a pointer to another segment or the end of an +improper list.) + +An improper list with 3 elements and non-nil tail: + + tag = 2 + + 0 1 2 3 4 5 6 7 + + foo bar baz bat + +A proper list filing up a cache line perfectly: + + tag = 6 + + 0 1 2 3 4 5 6 7 + + foo bar baz bat qux fux tux nil + +Tag value 7 means the list *might* be longer; this is a bit awkward +because the tail sits on the next cache line: + + tag = 7 + + 0 1 2 3 4 5 6 7 8 + + a b c d e f g h ptr + +However, if the list actually is longer, the elements could follow +immediately after that pointer, because hopefully the whole structure +was allocated in one contiguous array. These continuation pointers +are, in a sense, an artifact resulting from the fact that we can't +otherwise represent longer lists, since the length bits are scarce. + +This is fine, especially if the length bits are extended to 4, since +the point is using this for the kinds of short lists that are very +common in Lisp/Scheme code. How often does source code contain an +s-expression longer than 15 elements? (That's tag value 14, with +trailing nil to end it as a proper list; 16 slots total filling two +cache lines perfectly.) + +## Why improper lists at all? + +I've asked myself whether it would make sense to ditch improper lists +and make the terminating nil implicit for non-maximum length segments +but I don't like that idea. + +Improper lists kinda have their place in Lisp/Scheme code, such as to +represent rest arguments. And besides, the semantics get quite wonky +if the terminating nil is implicit on non-max length; you still need +it on max length if the list actually does end after that, and then +what do you do if it's not nil and also not a segment pointer? Now +you have improper lists again, but they can only have a non-tail +element count equal to a multiple of 8 or 16, which is weird. + +Plus, there may actually be valid use-cases where you create segments +of non-max length, and then put a pointer to another segment at the +end anyway. The parser has no need for this, but someone using this +segment-array-linked-list structure for some special purpose could +find a use for this, perhaps. + +## Closing up + +I think this will be a very fine optimization with respect to the +operation of the interpreter. I've lots of ideas in mind on how to +make it almost like a bytecode VM despite taking tree data. As an +example, after the first time a list is evaluated as a function call, +verifying that it's a proper list with an identifier referring to a +function in the first position, it can be modified as follows: + +The head of the list is replaced with a special NaN-packed value of +type "call instruction" that encodes a pointer to a function as a +payload directly in the NaN. This instruction implies that the list +has been verified to be proper, so the length tag on the pointer that +led us to this list can be relied on directly, not requiring another +verification that the call expression (list) is well-formed (proper). + +Bytecode is inherently linear, whereas a tree structure inherently +requires some pointer jumping. Consider the expression: + + (+ x (* y (/ z t))) + +As bytecode, this would be turned into instructions that first call +the division operator, put the result in a register or the stack, +continue with the multiplication, and so on. + +As a tree, even if you did some pre-verification and turned symbols +into direct call instructions embedding a pointer and so on, you're +still dependent on recursion to evaluate the innermost form first, +store the result, walk back up a level, and so on. + +Other than by a complete transformation of the code, extracting inner +expressions and putting their results into local variables, you can't +make this as nice to execute as bytecode. I wonder how much overhead +that really causes though, if the interpreter is otherwise optimized +well and rewrites s-expressions in-place to save future work, and the +various array-lists the code is made up of all fit neatly into L1. + +You may be wondering why I don't just go the bytecode VM route, but +I've already wandered off from the topic far enough, so maybe I'll +write a separate note about that. diff --git a/notes/index.md b/notes/index.md index a4f753e..e331101 100644 --- a/notes/index.md +++ b/notes/index.md @@ -31,3 +31,4 @@ * [Interpreter](260522-interpreter.html) * [Static analysis to optimize GC](260524-gc-opt.html) * [Cons cell optimization!](260608-fastcons2.html) +* [Cons cell optimization again](260610-fastcons3.html) |
