init

author: Taylan Kammer <taylan.kammer@gmail.com> 2025-02-10 09:16:36 +0100
committer: Taylan Kammer <taylan.kammer@gmail.com> 2025-02-10 09:16:36 +0100
commit: 831dc694c404826e9a1bf07788e10b9ac3d9cb2d (patch)
tree: 1154eafac62c89d65069c34e344d5f9c4263889a /README.md
1 files changed, 576 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..7a2146c
--- /dev/null
+++ b/README.md
@@ -0,0 +1,576 @@
+# Zisp: 21st-century Scheme-inspired language
+
+Zisp is my experimental toy language inspired by Scheme.  The idea is
+that it's a modern "re-imagining" of what Scheme may have been had it
+been invented today, and had it been designed with pragmatic use as a
+primary concern in its design.
+
+This language doesn't actually exist yet.  You are merely reading the
+ramblings of a madman.
+
+
+## Compilation is execution
+
+Any Scheme implementation with support for procedural macros allows
+arbitrary code execution at compile-time.  However, this is slightly
+awkward:
+
+```scheme
+
+  (define-syntax comptime
+    (lambda (stx)
+      (syntax-case stx ()
+        ((_)
+         (begin
+           (display "foo\n")))))
+
+  (comptime)
+
+```
+
+Compiling this with, for example, `guild compile foo.scm` will lead to
+the line "foo" being printed.  (The actual program is a no-op and will
+do nothing when run after compiled.)
+
+One can of course implement a macro such as `eval-when-compile` to
+make this slightly less awkward.  Using R6RS:
+
+```scheme
+
+  (import (rnrs eval))
+
+  (define-syntax eval-when-compile
+    (lambda (stx)
+      (syntax-case stx ()
+        ((_ imports body ...)
+         (eval
+           (syntax->datum #'(begin body ...))
+           (apply environment (syntax->datum #'imports)))))))
+
+  (eval-when-compile
+   ((rnrs))
+   (display "foo\n"))
+
+```
+
+An implementation may of course contain such a macro in its standard
+library, but it's unclear why the language should put such a hurdle in
+our way.  There are problems beyond this little hurdle as well.
+
+Top-level forms in Scheme are semantically executed at run-time, not
+compile-time.
+
+(Actually, the Scheme standards don't explicitly define a run-time or
+compile-time stage, but it's arguably implicit in the fact that macros
+are *not* first-class, and are defined by the language in such a way
+that they can be executed entirely at compile-time if ahead-of-time
+compilation is supported by an implementation.)
+
+Consider the case where the programmer wants to perform a relatively
+costly calculation at compile-time and store the result as part of the
+compiled program.  Say, a lookup-table.  Naively, we may attempt the
+following:
+
+```scheme
+
+  ;; The fictional file `lookup-table.dat` would be in the source code
+  ;; repository, and the fictional procedure `process-data` would read
+  ;; it and return a data structure.
+  (define lookup-table (process-data "lookup-table.dat"))
+
+```
+
+This will not work.  Compiling a Scheme file containing such a form
+will produce a program that calls `process-data` at run-time and not
+at compile-time as intended.
+
+One can of course resolve this with an explicit use of a procedural
+macro.  In fact, all one needs to do is redefine `process-data` as a
+macro, but I find this to be an unnecessary complication.
+
+Further, any sufficiently intelligent implementation *will* actually
+execute such top-level definitions at compile-time, given that they
+only make use of compile-time constants and pure functions.
+
+```scheme
+
+  ;; Guile will compute the value at compile-time.
+  (define seconds-per-day (number->string (* 24 60 60)))
+
+```
+
+This is easily observed by running `guild compile test.scm -o test.go`
+and then `strings test.go` which will contain the string 86400 in its
+output.  (If we didn't use `number->string`, it would be harder to
+locate the number 86400 in the output, since it would be binary.)
+
+This works because Guile implements the "partial evaluation" strategy
+for program optimization.  This requires the optimizer to know which
+procedures are pure.  A limitation in the implementation may lead to
+some such opportunities to be missed, and for the compiled program to
+execute unnecessary code at run-time.
+
+To recap: The top-level of a Scheme file is conceptually executed at
+run-time.  But an optimizer may execute some of it at compile-time
+anyway.  However, we're at the mercy of the implementation quality for
+this to happen consistently.  We can use procedural macros to force
+some execution to definitely happen at compile-time.  What a mess!
+
+It would be so much simpler if compiling a program meant, at the
+language semantics level, that the top-level is executed.
+
+Any run-time initialization of a program or module should be explicit,
+such as by putting it into a `main` function, or having the module
+export an initialization function.  (The language may, if I feel like
+it, allow for declaring a module initializer function, which would be
+invoked automatically when a module is loaded.)
+
+
+## Everything can be serialized
+
+If you're familiar with Guile --and I suspect most implementations of
+Scheme have a similar limitation-- then you may have noticed an issue
+in the above example where the programmer intends to compute a lookup
+table at compile-time.
+
+Not all Scheme objects can be serialized.  This not only applies to
+the `write` procedure, but also the compiler's ability to put objects
+into the binary representation of a compiled program.  (For example,
+the .data section of an ELF file in case of Guile.)
+
+This can be demonstrated as follows:
+
+```scheme
+
+  (define-syntax process-data
+    (lambda (stx)
+      (syntax-case stx ()
+        ((_ file)
+         ;; Ignore `file`; this is just an example!
+         (let ((ht (make-eqv-hashtable)))
+           (hashtable-set! ht 1 2)
+           ht)))))
+
+  (define lookup-table (process-data "lookup-table.dat"))
+
+```
+
+Compiling this with `guild` will yield an error, complaining about an
+"unhandled constant" represented as #<r6rs:hashtable ...> in the error
+message.  What it's actually trying to say is that hash tables aren't
+constants, and the compiler doesn't know how to put them into the ELF
+file it's writing.
+
+(At least, this is the case as of February 2025, using Guile 3.0.10;
+who knows what the future will provide!)
+
+In Zisp, I want absolutely everything to be possible to serialize, and
+the compiler should simply be using this capability of the language to
+write out compiled binaries.
+
+For example, given that any Zisp program has to declare a `main` entry
+point function, all the compiler would do is execute the file and then
+call `(write main)`.  How elegant!
+
+This serialization of a function would, of course, involve traversing
+all references therein, and including them in the output somehow.  The
+same will apply to writing out any data structure.  This means that
+serializing a module is *not* a matter of invoking `write` on each of
+its exported definitions.  This would lead to lots of duplicate data
+between the outputs, and `eq?` relations would be lost after reading
+them back in.  We probably want a first-class `module` type that can
+be serialized as one object.
+
+(If we want symmetry between `read` and `write`, then `read` needs to
+detect that it's looking at a compiled binary.  Not entirely sure yet
+if I really want this, but I think I do.)
+
+
+## Symbols are strings are symbols
+
+In Scheme, symbols are literally just interned and immutable strings.
+They can contain any character a string can, constructed either via
+`string->symbol` or the modern `|foo bar baz|' syntax for quoted
+symbols.  Why not just embrace the fact that they are strings?
+
+Scheme strings are mutable, but they are a terrible choice for text
+manipulation, because they are constant-length.  They are literally
+just vectors of characters.  If you wanted a vector of characters,
+well, use a vector of characters!
+
+Zisp won't differentiate between symbols and strings.  All strings
+will be immutable, string constants will be automatically interned,
+and bare symbols will just be reader syntax for a string constant.
+
+Instead of `string->symbol` we will have `string-intern` which
+basically does the same thing.  Dynamically generated strings that
+aren't passed to this function will be uninterned.
+
+
+## Stop the "cons" madness!
+
+Lists are neat, but they aren't the best representation for sequences
+of fixed length.  An array/vector is a better choice for this.
+
+R6RS already uses vectors in some places where a traditional lisper
+may have expected to see lists.  Namely, in the procedural layer of
+record types, where the fields of a record type are represented by
+vectors.  There may be more places in Scheme where this makes sense.
+
+
+## Better handling of rest args
+
+Initially, I was thinking of using either immutable vectors, or
+immutable lists transparently backed by arrays, to represent rest
+arguments.  But the following paper offers a very compelling
+alternative:
+
+  https://legacy.cs.indiana.edu/~dyb/pubs/LaSC-3-3-pp229-244.pdf
+
+Long story short, rest argumenst are received through a parameter list
+such as `(arg1 ... argn & rest)` and the identifier `rest` is special
+in that it can only be passed on to another procedure using the same
+syntax.  For example, to explicitly put the rest args into a list and
+map over them:
+
+```scheme
+
+  (define (map* proc & args)
+    (map proc (list & args)))
+
+  (map* square 1 2 3) ;=> (1 4 9)
+
+```
+
+Recursive functions that directly consume an arbitrary number of args,
+without needing to allocate any data structure, can be implemented by
+combining this feature with what is today known as `case-lambda`:
+
+```scheme
+
+  (define combine*
+    (case-lambda
+      ((x) x)
+      ((x y & rest) (combine* (combine x y) & rest))
+
+```
+
+Though the paper proposes the use of `&` so as to differentiate it
+from the regular rest-argument mechanism of Scheme, I intend to make
+Zisp use only this mechanism, so we can use the dot notation for it.
+Rewriting the above examples in this style gives us:
+
+```scheme
+
+  (define (map* proc . args)
+    (map proc (list . args)))
+
+  (define combine*
+    (case-lambda
+      ((x) x)
+      ((x y . rest) (combine* (combine x y) . rest))))
+
+```
+
+I find this very pleasing on the eyes, and a very elegant way to use
+improper lists in evaluation context, which isn't allowed in Scheme.
+
+
+## More ergonomic multiple-values
+
+The paper linked above proposes to reuse the rest args syntax for an
+elegant solution to consuming multiple values:
+
+```scheme
+
+  (proc x y & <expr>)
+
+```
+
+In the above, `<expr>` may evaluate to multiple values, and the values
+will be passed to `proc` as additional arguments.
+
+Essentially, this means that the special rest-arg identifier is itself
+a representation of multiple values, or in other words, evaluating it
+results in multiple values even though it's just an identifier!
+
+Demonstration, using Zisp notation for multiple-value rest args:
+
+```scheme
+
+  (define (foobar . rest)
+    (let-values (((x y z) rest))
+      ;; This is a meaningless example for demonstration, since we
+      ;; could have just made the function accept three parameters.
+      ;; The `let-values` here is the one from SRFI 11.
+      ))
+
+```
+
+The paper also proposes terminating lambda bodies with `& <expr>` to
+return multiple values, but this is obsolete as of R5RS, which allows
+the final expression in a body to evaluate to multiple values anyway.
+
+However, the use of & to pass multiple values as arguments is very
+convenient and way clearer than R5RS's clumsy `call-with-values`:
+
+```scheme
+
+  ;; Returns {bool, obj} where bool indicates success/failure and obj
+  ;; is meaningless if bool is false; allows differentiating between
+  ;; the case where #f is found as the value vs. nothing being found.
+  (define (lookup alist key)
+    (if (null? alist)
+        (values #f #f)
+        (if (eqv? key (caar alist))
+            (values #t (cdar alist))
+            (lookup (cdr alist) key))))
+
+  (define (display-if-found found? obj)
+    (when found? (display obj)))
+
+  ;; Incredibly ugly `call-with-values`:
+  (call-with-values
+    (lambda () (lookup '((x . y)) 'x))
+    display-if-found)
+
+  ;; (Up until here is valid R5RS code, by the way.)
+
+  ;; So much cleaner:
+  (display-if-found & (lookup '((x . y)) 'x))  ;; displays x
+
+```
+
+Unfortunately, we can't reuse the improper list syntax in the last
+example, since the following s-expressions are equivalent:
+
+```scheme
+
+  (foo . (bar baz))
+  (foo bar baz)
+
+```
+
+In Zisp, this will be solved by making `apply` a special-form where
+the last operand is expected to evaluate to multiple values rather
+than a list:
+
+```scheme
+
+  ;; (apply <proc-expr> <argn-expr> ... <restargs-expr>)
+
+  (apply display-if-found (lookup '((x . y)) 'x))
+
+```
+
+Note that this means the forms `(apply foo rest)` and `(foo . rest)`
+are equivalent if `rest` is an identifier and not a pair/list, while
+`(apply foo (x ...))` is of course NOT equivalent to `(foo x ...)`.
+
+
+## A little bit of syntax sugar never hurt anyone
+
+Why not make `foo.bar` reader syntax for `(record-ref foo bar)` where
+`record-ref` is a macro that interprets `bar` as a constant?
+
+Admittedly, this loses us some elegance compared to the widely used
+SRFI 9 (standardized in R7RS-small) where field getters only exist as
+procedures, so making a field "private" is simply a matter of leaving
+the getter out of your module exports.
+
+A general `record-ref` that accepts field names would need to be aware
+of the concept of private fields and block access to them except if
+invoked in a context where the private fields are accessible.
+
+I think the nice syntax is worth the added complexity.  It doesn't
+seem terribly difficult to expand the concept of "lexical scope" to
+include field accessibility information.
+
+Alternatively, we could just not have private fields, like in Python.
+Fields that are meant to be private could be named in a special way by
+convention, such as `%foo` or whatever.  I have to check whether real
+encapsulation would provide us with any substantial benefits, such as
+in optimization.
+
+Furthermore, `foo[bar]` could be sugar for `(vector-ref foo bar)` and
+`foo{bar}` could be sugar for `(map-ref foo bar)` or the like.  (Not
+sure yet whether to use the word "map" for a data type, due to the
+overlap with the `map` function.)
+
+The functionality of SRFI 17 should be a core aspect of the language,
+so the following all work:
+
+```scheme
+
+  ;; Record
+  (define rec (make-foo))
+  (set! rec.field value)
+
+  ;; Vector
+  (define vec (vector 1 2 3))
+  (set! vec[n] value)
+
+  ;; Some kind of map
+  (define ht (hash-table))
+  (set! ht{key} value)
+
+```
+
+
+## More immutability
+
+I see no reason to have mutable variables in the language.
+
+Usually, code is analyzed to distinguish between mutable and immutable
+variables, because this aids in optimization.  This means you end up
+with two types of variables, and whether a variable is of one or the
+other type is determined solely from how it's used.  Ugly!
+
+An explicit box data structure can trivially replicate the features of
+a mutable variable, so let's just use that instead.
+
+Our `set!` can assume a box when a plain identifier is used.  But to
+get the value, we call `get`.
+
+```scheme
+
+  (let ((x (box 0)))
+    (while foo
+      (set! x (+ x 1)))
+    (get x))
+
+```
+
+I've not yet made up my mind on whether pairs should be immutable by
+default, but they probably should.
+
+Strings, as already mentioned before, will be immutable, since string
+constants will be the same thing as a symbol.
+
+
+## No shadowing (shock!) and a reduced set of `let` forms
+
+This may be shocking for Schemers but I believe shadowing variables is
+evil.  I've already written a bug once that would have been prevented
+had it not been possible to shadow variables, and I don't even write
+that much Scheme code.
+
+And let's face it: The presence of four different forms by the name of
+`let`, `let*`, `letrec`, and `letrec*` is daunting when you're coming
+from another language.  Lack of shadowing allows us to reduce this
+without losing out much functionality.
+
+In the absence of shadowing, `let` becomes nearly useless because you
+can always use `letrec` to fulfill the same need; it's strictly more
+powerful.  (The only thing `let` can do that `letrec` can't, is to
+refer to the previous binding of a variable before shadowing it.)
+
+Further, the value of vanilla `letrec` is dubious when `letrec*` is
+strictly more powerful.  So, in Zisp, `let` is the ultimate binding
+form that does what `letrec*` does in Scheme.
+
+Except it does more!  We haven't looked at the whole `let-values`
+family yet.  In Zisp, these are also merged into the ultimate `let`,
+using the SRFI 71 syntax:
+
+```scheme
+
+  (let ((a (one-value))
+        (b c (two-values))
+        ((values d e . rest) (arbitrary-values)))
+    (do-things))
+
+```
+
+You may be wondering whether it also supports the "let loop" syntax of
+vanilla Scheme `let` and the answer is no, because I hate that syntax.
+It has too high a risk of leading to absolute spaghetti code with no
+clear indication as to how the loop variables are being updated.
+
+If you want to loop, use a dedicated looping syntax!  Even `do` is
+better than "let loop" shenanigans if you ask me.
+
+
+## Return zero values when there's nothing to return
+
+This is only a minor point: It's a long-running pet peeve of mine that
+Scheme specifies "an unspecified value" to be returned when there's
+nothing meaningful to return.  It's a remnant from before we had the
+ability to return multiple values, and should be eliminated.
+
+Any operation that has nothing meaningful to return, will return zero
+values in Zisp.
+
+
+## Strict mode to disallow ignoring returned values
+
+This ties in to the last point.  In Scheme, a non-tail expression in a
+body can return an arbitrary number of values, which will be silently
+ignored.
+
+This can lead to bugs, where a procedure actually returns some kind of
+success or failure indicator (instead of raising an error) and the
+programmer forgets to handle it.
+
+Though it may be too inefficient to enable globally, there should at
+least be a mode of compilation that emits code which checks at every
+single function return whether there are any values that are being
+ignored, and raises an error if so.
+
+There would of course be a form to explicitly ignore values.
+
+
+## Only Booleans have truthiness
+
+Like in Java, there should be no implicit conversion of values to a
+Boolean.  This leads to sloppy code and subtle bugs.
+
+I believe the code base of Guix contained an example of this at some
+point: Build phases ending in a call to `system*` would return 0 or 1
+which would pass as "true" regardless.
+
+In most cases, you should know the actual type of the value you're
+receiving, and do an appropriate check, be it `zero?`, `null?`, or
+some other check.  If you truly want to check if something is "any
+value other than false" then you can always do:
+
+```scheme
+
+  (if (not (eq? #f value))
+      (do something))
+
+```
+
+No, you cannot use `(not (not x))` because `not` obviously expects a
+Boolean argument!  Duh.
+
+I'm actually serious about this.  Scheme went all the way to make null
+separate from false, but then refused to go all the way and decided to
+allow non-Boolean values to function as Booleans anyway.
+
+Of course, performing a type-check on every single conditional may
+incur a serious performance penalty.  If so, then the same flag that
+determines whether returned values can be ignored may also determine
+whether non-Booleans can be coerced into Booleans.
+
+
+## Subtyping of record types
+
+It's a serious limitation of SRFI 9 that it doesn't allow creating
+subtypes with additional fields.  This is an invaluable strategy for
+representing a hierarchy of types, which are ubiquitious in real life
+and thus in programming.
+
+Sadly, this brings with it some significant complications if records
+are to be initialized with default values to ensure invariants.  The
+R6RS solves this with an incredibly sophisticated system, which we
+might need to adopt.  (Search for "protocol" and "record-constructor
+descriptor" in the R6RS.)
+
+However, we may be able to get away with a simpler approach...
+
+
+## OOP is not that bad
+
+The immense complexity of orchestrating the correct initialization of
+a record type can be overcome by
author	Taylan Kammer <taylan.kammer@gmail.com>	2025-02-10 09:16:36 +0100
committer	Taylan Kammer <taylan.kammer@gmail.com>	2025-02-10 09:16:36 +0100
commit	831dc694c404826e9a1bf07788e10b9ac3d9cb2d (patch)
tree	1154eafac62c89d65069c34e344d5f9c4263889a /README.md