1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
|
# Symbols and strings, revisited
My [original plan](symbols.html) was to make strings and symbols one
and the same. Then I realized this introduced ambiguity between bare
strings meant as identifiers, and quoted strings representing a string
literal in code.
After a bunch of back-and-forth, I came up with the idea of the Zisp
[decoder](reader.html) with which I'm very happy overall, but I still
decided to ditch the idea of using an intermediate representation for
quoted string literals like `(#STRING . "foo")` after all.
The idea was that the reader would have a data mode and a code mode
and that quoted strings would become `(#STRING . "foo")` or such in
code mode, but not in data mode. This way, reading a configuration
file (in data mode) that uses quoted strings would not end up giving
you this wonky thing with `#STRING`.
It was an exciting idea at first, but eventually I realized that the
above was the *only* substantial reason to have separate modes for
reading s-expressions. It also annoyed me a bit that every single
quoted string in code would be wrapped in a cons cell...
So, ultimately I've decided to simply make quoted strings a proper
sub-type of strings. (Or make symbols a sub-type of strings; which
ever way you want to look at it.)
Also, my [NaN-packing strategy](nan.html) has so much extra room that
I've decided to put up-to-6-byte strings into NaNs as an optimization
hack, and this applies to both quoted and bare strings.
So we have two different string types, and two different in-memory
representations for each. Let's summarize and give them names:
* sstr: Short string (symbol, up to 6 bytes)
* qstr: Quoted short string (non-symbol, up to 6 bytes)
* istr: Interned string (symbol, greater than 6 bytes)
* ustr: Uninterned string (non-symbol, greater than 6 bytes)
Don't get hung up on the short four-letter names; they aren't fully
descriptive. The "qstr" isn't the only one representing a quoted
string literal; a "ustr" may also represent one.
Here's how the parser uses these types:
* Encountered an unquoted string of up to 6 bytes? Make a sstr.
* Encountered a quoted string of up to 6 bytes? Make a qstr.
* Unquoted string of more than 6 bytes? Intern it to make an istr.
* Quoted string of more than 6 bytes? Uninterned string.
*** WIP ***
|