# Cons cell optimization again _2026 June_ Last episode: [Cons cell optimization!](260608-fastcons2.html) Given the sub-optimal situation with these kinds of optimizations for manual uses of `cons` in program code, perhaps it's best to limit the scope of this optimization: * Regular `cons` creates normal pairs; `car` and `cdr` only work on normal pairs. * Programmers can explicitly use the array-backed list data type if they want to; it's simply another data structure available in the standard library, alongside things like dynamic arrays that grow upon saturation and move their entire contents. Note: Those dynamic regrowing arrays are what are typically called an ArrayList data structure, like in Zig or Java. In this article I'm using terms like array-backed list or array-list for my jumbo cons cell concept instead. I believe some call this SegmentedList. * We break from Lisp tradition and *don't* return regular pairs from s-expression parsing; we return array-backed lists. This is a sane choice because: 1. A parser knows the length of any list it's allocating; why use a pair-based linked list structure? 2. One of the primary users of the parser is the interpreter; it's greatly beneficial for its performance to use contiguous arrays for code forms to evaluate. It just means that when you parse some data, you need to make sure to use the array-list API rather than the cons cell API to traverse the data you've parsed. I think that's fine. Given that these array-lists don't pretend to be cons cells anymore, the whole "sentinel bit pattern at the start" thing isn't necessary anymore; programmers using this data type simply need to understand that it's a different data structure and they can't simply acquire a pointer to the "rest" of the list from any arbitrary point. We don't need a `cdr` equivalent for this data type at all; we offer other functions to traverse it, segment by segment. You can't get a pointer to the middle of a segment; you can only get a pointer to the head of the next segment. So, the 3-bit or 4-bit tag on the pointer gets a slightly different meaning: It's simply the length of the current segment. Well, it's actually the same exact meaning as before, and technically you could still construct a pointer to the middle of a segment with the length tag indicating the remaining length of that segment, but it's simply not going to be supported; the GC could collect the segment, leaving you with a pointer to invalid memory. Also, this means that any length other than the maximum implies that the list ends there; you'd never put a pointer to another segment to the end of an unsaturated segment, because why would you. Let's look at a few examples of such array-lists in memory. We'll assume a 3-bit tag for simplicity; you can imagine what a 4-bit tag would do. Single-element proper list: tag = 0 0 1 2 3 4 5 6 7 foo nil (Earlier I wrote that the tag is "simply the length of the current segment" which was a lie: The length is tag + 2 and the last slot in the segment is nil or a pointer to another segment or the end of an improper list.) An improper list with 3 elements and non-nil tail: tag = 2 0 1 2 3 4 5 6 7 foo bar baz bat A proper list filing up a cache line perfectly: tag = 6 0 1 2 3 4 5 6 7 foo bar baz bat qux fux tux nil Tag value 7 means the list *might* be longer; this is a bit awkward because the tail sits on the next cache line: tag = 7 0 1 2 3 4 5 6 7 8 a b c d e f g h ptr However, if the list actually is longer, the elements could follow immediately after that pointer, because hopefully the whole structure was allocated in one contiguous array. These continuation pointers are, in a sense, an artifact resulting from the fact that we can't otherwise represent longer lists, since the length bits are scarce. This is fine, especially if the length bits are extended to 4, since the point is using this for the kinds of short lists that are very common in Lisp/Scheme code. How often does source code contain an s-expression longer than 15 elements? (That's tag value 14, with trailing nil to end it as a proper list; 16 slots total filling two cache lines perfectly.) ## Why improper lists at all? I've asked myself whether it would make sense to ditch improper lists and make the terminating nil implicit for non-maximum length segments but I don't like that idea. Improper lists kinda have their place in Lisp/Scheme code, such as to represent rest arguments. And besides, the semantics get quite wonky if the terminating nil is implicit on non-max length; you still need it on max length if the list actually does end after that, and then what do you do if it's not nil and also not a segment pointer? Now you have improper lists again, but they can only have a non-tail element count equal to a multiple of 8 or 16, which is weird. Plus, there may actually be valid use-cases where you create segments of non-max length, and then put a pointer to another segment at the end anyway. The parser has no need for this, but someone using this segment-array-linked-list structure for some special purpose could find a use for this, perhaps. ## Closing up I think this will be a very fine optimization with respect to the operation of the interpreter. I've lots of ideas in mind on how to make it almost like a bytecode VM despite taking tree data. As an example, after the first time a list is evaluated as a function call, verifying that it's a proper list with an identifier referring to a function in the first position, it can be modified as follows: The head of the list is replaced with a special NaN-packed value of type "call instruction" that encodes a pointer to a function as a payload directly in the NaN. This instruction implies that the list has been verified to be proper, so the length tag on the pointer that led us to this list can be relied on directly, not requiring another verification that the call expression (list) is well-formed (proper). Bytecode is inherently linear, whereas a tree structure inherently requires some pointer jumping. Consider the expression: (+ x (* y (/ z t))) As bytecode, this would be turned into instructions that first call the division operator, put the result in a register or the stack, continue with the multiplication, and so on. As a tree, even if you did some pre-verification and turned symbols into direct call instructions embedding a pointer and so on, you're still dependent on recursion to evaluate the innermost form first, store the result, walk back up a level, and so on. Other than by a complete transformation of the code, extracting inner expressions and putting their results into local variables, you can't make this as nice to execute as bytecode. I wonder how much overhead that really causes though, if the interpreter is otherwise optimized well and rewrites s-expressions in-place to save future work, and the various array-lists the code is made up of all fit neatly into L1. You may be wondering why I don't just go the bytecode VM route, but I've already wandered off from the topic far enough, so maybe I'll write a separate note about that.