summaryrefslogtreecommitdiff
path: root/notes/symbols.md
diff options
context:
space:
mode:
authorTaylan Kammer <taylan.kammer@gmail.com>2025-03-29 11:10:24 +0100
committerTaylan Kammer <taylan.kammer@gmail.com>2025-03-29 11:10:24 +0100
commit451aa92846b5fd5c8a0739336de3aa26d741d750 (patch)
tree21e51213bf1d39c2a8677060c51d83a656873786 /notes/symbols.md
parent5025f9acf31cd880bbff62ff47ed03b69a0025ee (diff)
Relocate MD sources for HTML notes.
Diffstat (limited to 'notes/symbols.md')
-rw-r--r--notes/symbols.md112
1 files changed, 112 insertions, 0 deletions
diff --git a/notes/symbols.md b/notes/symbols.md
new file mode 100644
index 0000000..280fd9f
--- /dev/null
+++ b/notes/symbols.md
@@ -0,0 +1,112 @@
+# Symbols are strings are symbols
+
+In Scheme, symbols are literally just interned and immutable strings.
+They can contain any character a string can, constructed either via
+`string->symbol` or the modern `|foo bar baz|` syntax for quoted
+symbols. Why not just embrace the fact that they are strings?
+
+Scheme strings are mutable, but they are a terrible choice for text
+manipulation, because they are constant-length. They are literally
+just vectors of characters. If you wanted a vector of characters,
+well, use a vector of characters!
+
+Zisp won't differentiate between symbols and strings. All strings
+will be immutable, string constants will be automatically interned,
+and bare symbols will just be reader syntax for a string constant.
+
+Instead of `string->symbol` we will have `string-intern` which
+basically does the same thing. Dynamically generated strings that
+aren't passed to this function will be uninterned.
+
+
+## But but but
+
+(Late addition because I didn't even notice this problem at first.
+How embarrassing!)
+
+But if symbols and strings are the same thing at the reader level,
+then how on earth would you have a string literal in source code,
+without it being evaluated as a variable?
+
+ (display "bar") ;should this look up the variable 'bar'?!
+ (display bar) ;should this display the string 'bar'?!
+
+There's actually a simple solution.
+
+The syntax `"string"`, with double-quotes and nothing else, becomes
+reader syntax akin to the apostrophe, and expands to:
+
+ (quote #"string")
+
+And `#"string"` is the real syntax for string literals, which are
+always treated as identifiers by the evaluator.
+
+Bare identifiers like `foo` instead directly become `#"foo"`, without
+the wrapping `(quote ...)`, and are thus evaluated.
+
+This also means that manually writing `#"string"` in your source code
+allows that to be used as an identifier regardless of whether it has
+illegal characters in it, essentially doing what `|string|` does in
+R7RS-small.
+
+Let's sum it up; here's the reader transformations:
+
+ foo -> #"foo"
+ "foo" -> (quote #"foo")
+ "foo bar" -> (quote #"foo bar")
+ #"foo bar" -> #"foo bar"
+
+Some pseudo-code based on Scheme:
+
+ (let ((#"all your" "base ")
+ (#"are belong" "to us"))
+ (display
+ (string-append #"all your" #"are belong")))
+
+That prints: "base to us"
+
+I'm not married to the syntax `#"string"` and may end up using the
+simpler `|foo|` in the end. It doesn't really matter.
+
+
+## More problems
+
+Dangit, couldn't have been so easy could it?
+
+What if you have a configuration file with these contents:
+
+ ((job-name "Clear /tmp directory")
+ (interval reboot)
+ (command "find /tmp -mindepth 1 -delete"))
+
+Now all those "string constants" will turn into lists with the string
+"quote" at its head. Terrible. One could write it with the explicit
+string literal syntax `#"foo"` for strings, but that's also terrible.
+
+
+## Salvageable
+
+I'm not yet done with this idea. What if strings simply have a flag
+that says whether they are intended as a symbol or not?
+
+While reading, it would be set automatically. Instead of `intern`,
+one would call a function like `symbol`, which would return a string
+with the flag set, after interning it if necessary; it would simply
+return the original string if it already had the flag set.
+
+Another way to look at this is that strings and symbols are sort of
+"polymorphic" and can be used interchangeably. I don't want to get
+into JavaScript style automatic type conversions (yuck) but this may
+simply be a flag that's set on a string, which makes it a subtype of
+regular strings.
+
+Yes yes, I think that's good. I even still have enough space left in
+the NaN-packing strategy to put a tag on "short strings" which are our
+6-byte immediate strings.
+
+
+## Reconsidering AGAIN
+
+*This got too long and off-topic so it continues here:*
+
+[Reader? Decoder? I barely know 'er!](reader.html)