diff options
Diffstat (limited to 'notes/strings.md')
| -rw-r--r-- | notes/strings.md | 57 |
1 files changed, 57 insertions, 0 deletions
diff --git a/notes/strings.md b/notes/strings.md new file mode 100644 index 0000000..6f01944 --- /dev/null +++ b/notes/strings.md @@ -0,0 +1,57 @@ +# Symbols and strings, revisited + +My [original plan](symbols.html) was to make strings and symbols one +and the same. Then I realized this introduced ambiguity between bare +strings meant as identifiers, and quoted strings representing a string +literal in code. + +After a bunch of back-and-forth, I came up with the idea of the Zisp +[decoder](reader.html) with which I'm very happy overall, but I still +decided to ditch the idea of using an intermediate representation for +quoted string literals like `(#STRING . "foo")` after all. + +The idea was that the reader would have a data mode and a code mode +and that quoted strings would become `(#STRING . "foo")` or such in +code mode, but not in data mode. This way, reading a configuration +file (in data mode) that uses quoted strings would not end up giving +you this wonky thing with `#STRING`. + +It was an exciting idea at first, but eventually I realized that the +above was the *only* substantial reason to have separate modes for +reading s-expressions. It also annoyed me a bit that every single +quoted string in code would be wrapped in a cons cell... + +So, ultimately I've decided to simply make quoted strings a proper +sub-type of strings. (Or make symbols a sub-type of strings; which +ever way you want to look at it.) + +Also, my [NaN-packing strategy](nan.html) has so much extra room that +I've decided to put up-to-6-byte strings into NaNs as an optimization +hack, and this applies to both quoted and bare strings. + +So we have two different string types, and two different in-memory +representations for each. Let's summarize and give them names: + +* sstr: Short string (symbol, up to 6 bytes) + +* qstr: Quoted short string (non-symbol, up to 6 bytes) + +* istr: Interned string (symbol, greater than 6 bytes) + +* ustr: Uninterned string (non-symbol, greater than 6 bytes) + +Don't get hung up on the short four-letter names; they aren't fully +descriptive. The "qstr" isn't the only one representing a quoted +string literal; a "ustr" may also represent one. + +Here's how the parser uses these types: + +* Encountered an unquoted string of up to 6 bytes? Make a sstr. + +* Encountered a quoted string of up to 6 bytes? Make a qstr. + +* Unquoted string of more than 6 bytes? Intern it to make an istr. + +* Quoted string of more than 6 bytes? Uninterned string. + +*** WIP *** |
