summaryrefslogtreecommitdiff
path: root/notes
diff options
context:
space:
mode:
Diffstat (limited to 'notes')
-rw-r--r--notes/boot.md17
-rw-r--r--notes/numbers.md45
-rw-r--r--notes/strings.md57
-rw-r--r--notes/unread.md36
4 files changed, 155 insertions, 0 deletions
diff --git a/notes/boot.md b/notes/boot.md
new file mode 100644
index 0000000..758d264
--- /dev/null
+++ b/notes/boot.md
@@ -0,0 +1,17 @@
+# Bootstrapping Zisp
+
+In my opinion, any serious programming language must have a serious
+bootstrapping strategy that addresses the "Trusting Trust" issue aka
+the Thompson Hack. The easiest way to do that is making sure that
+your language can be bootstrapped from an existing language, which
+itself has some solution to the problem.
+
+Currently, I'm thinking of implementing Zisp in Zig. (That's not the
+entire reason Zisp is called Zisp, and I might choose a different
+language eventually, and/or rename Zisp, but anyway.)
+
+Zig, in turn, will *hopefully* be possible to bootstrap from C in the
+future, or some language implemented in C. For C, there are some ways
+to bootstrap it from scratch.
+
+*** WIP ***
diff --git a/notes/numbers.md b/notes/numbers.md
new file mode 100644
index 0000000..6507a67
--- /dev/null
+++ b/notes/numbers.md
@@ -0,0 +1,45 @@
+
+exacts:
+
+ uint : 0...n
+
+ sint : -n...-1 | uint
+
+ ratn : ( p: sint, q: sint )
+
+ comp : ( r: ratn, i: ratn )
+
+
+inexacts:
+
+ double : ieee754 double with +inf, -inf, +nan, -nan
+
+ cmp128 : ( r: double , i: double )
+
+
+exact operations:
+
+ uint + uint = uint
+
+ sint + uint = sint
+
+ ratn + uint = ratn [ ratn + ( p = uint , q = 0 ) ]
+
+ ratn + sint = ratn [ ratn + ( p = sint , q = 0 ) ]
+
+ ratn + ratn = ratn
+
+ comp + uint = comp [ comp + ( r = ( p = uint , q = 0 ) , i = 0 ) ]
+
+ comp + sint = comp [ comp + ( r = ( p = sint , q = 0 ) , i = 0 ) ]
+
+ comp + ratn = comp [ comp + ( r = ratn , i = 0 ) ]
+
+ comp + comp = comp
+
+
+inexact operations:
+
+ double + double = double
+
+ cmp128 + double = cmp128 [ cmp128 + ( r = double , i = 0 ) ]
diff --git a/notes/strings.md b/notes/strings.md
new file mode 100644
index 0000000..6f01944
--- /dev/null
+++ b/notes/strings.md
@@ -0,0 +1,57 @@
+# Symbols and strings, revisited
+
+My [original plan](symbols.html) was to make strings and symbols one
+and the same. Then I realized this introduced ambiguity between bare
+strings meant as identifiers, and quoted strings representing a string
+literal in code.
+
+After a bunch of back-and-forth, I came up with the idea of the Zisp
+[decoder](reader.html) with which I'm very happy overall, but I still
+decided to ditch the idea of using an intermediate representation for
+quoted string literals like `(#STRING . "foo")` after all.
+
+The idea was that the reader would have a data mode and a code mode
+and that quoted strings would become `(#STRING . "foo")` or such in
+code mode, but not in data mode. This way, reading a configuration
+file (in data mode) that uses quoted strings would not end up giving
+you this wonky thing with `#STRING`.
+
+It was an exciting idea at first, but eventually I realized that the
+above was the *only* substantial reason to have separate modes for
+reading s-expressions. It also annoyed me a bit that every single
+quoted string in code would be wrapped in a cons cell...
+
+So, ultimately I've decided to simply make quoted strings a proper
+sub-type of strings. (Or make symbols a sub-type of strings; which
+ever way you want to look at it.)
+
+Also, my [NaN-packing strategy](nan.html) has so much extra room that
+I've decided to put up-to-6-byte strings into NaNs as an optimization
+hack, and this applies to both quoted and bare strings.
+
+So we have two different string types, and two different in-memory
+representations for each. Let's summarize and give them names:
+
+* sstr: Short string (symbol, up to 6 bytes)
+
+* qstr: Quoted short string (non-symbol, up to 6 bytes)
+
+* istr: Interned string (symbol, greater than 6 bytes)
+
+* ustr: Uninterned string (non-symbol, greater than 6 bytes)
+
+Don't get hung up on the short four-letter names; they aren't fully
+descriptive. The "qstr" isn't the only one representing a quoted
+string literal; a "ustr" may also represent one.
+
+Here's how the parser uses these types:
+
+* Encountered an unquoted string of up to 6 bytes? Make a sstr.
+
+* Encountered a quoted string of up to 6 bytes? Make a qstr.
+
+* Unquoted string of more than 6 bytes? Intern it to make an istr.
+
+* Quoted string of more than 6 bytes? Uninterned string.
+
+*** WIP ***
diff --git a/notes/unread.md b/notes/unread.md
new file mode 100644
index 0000000..31b2f91
--- /dev/null
+++ b/notes/unread.md
@@ -0,0 +1,36 @@
+# Must ports support seeking?
+
+With traditional s-expressions, it's not always possible to stop
+reading bytes as soon as the end of the current datum is reached,
+because some data don't have a terminating character. Consider a
+sequence of s-expressions such as:
+
+ foo(bar)
+
+After reading the second 'o', the parser has no way of knowing that
+the symbol has ended. It must read another byte.
+
+If the underlying input stream doesn't support "unreading" or seeking
+back, this is troublesome: The opening parenthesis is consumed by the
+first call to the parser, and then discarded, since it's not part of
+the symbol it's reading. The second call to the parser cannot know
+that the "read head" is already within a list.
+
+I assume that traditional lisps work around this issue by requiring
+all streams (ports) to have seeking or unreading functionality, which
+isn't too bad. Assuming you only need to look ahead by one character,
+any port without this feature can be wrapped in a port that adds it
+via a simple one-character buffer. If more than one character of
+look-ahead is needed, a small circular buffer could be used.
+
+Thankfully, Zisp s-expressions are all self-terminating. This is
+because a datum followed immediately by another datum, without any
+blanks in between, is a "joined datum" expression. Any number of
+additional data can be joined like this, yielding a more and more
+deeply nested compound datum. Only a blank or EOF can end this,
+meaning that disjoint data within a stream are necessarily delimited
+by blanks.
+
+
+
+*** WIP ***