# Zisp: 21st-century Scheme-inspired language

Zisp is my experimental toy language inspired by Scheme.  The idea is
that it's a modern "re-imagining" of what Scheme may have been had it
been invented today, and had it been designed with pragmatic use as a
primary concern in its design.

This language doesn't actually exist yet.  You are merely reading the
ramblings of a madman.


## Compilation is execution

Any Scheme implementation with support for procedural macros allows
arbitrary code execution at compile-time.  However, this is slightly
awkward:

```scheme

  (define-syntax comptime
    (lambda (stx)
      (syntax-case stx ()
        ((_)
         (begin
           (display "foo\n")))))

  (comptime)

```

Compiling this with, for example, `guild compile foo.scm` will lead to
the line "foo" being printed.  (The actual program is a no-op and will
do nothing when run after compiled.)

One can of course implement a macro such as `eval-when-compile` to
make this slightly less awkward.  Using R6RS:

```scheme

  (import (rnrs eval))

  (define-syntax eval-when-compile
    (lambda (stx)
      (syntax-case stx ()
        ((_ imports body ...)
         (eval
           (syntax->datum #'(begin body ...))
           (apply environment (syntax->datum #'imports)))))))

  (eval-when-compile
   ((rnrs))
   (display "foo\n"))

```

An implementation may of course contain such a macro in its standard
library, but it's unclear why the language should put such a hurdle in
our way.  There are problems beyond this little hurdle as well.

Top-level forms in Scheme are semantically executed at run-time, not
compile-time.

(Actually, the Scheme standards don't explicitly define a run-time or
compile-time stage, but it's arguably implicit in the fact that macros
are *not* first-class, and are defined by the language in such a way
that they can be executed entirely at compile-time if ahead-of-time
compilation is supported by an implementation.)

Consider the case where the programmer wants to perform a relatively
costly calculation at compile-time and store the result as part of the
compiled program.  Say, a lookup-table.  Naively, we may attempt the
following:

```scheme

  ;; The fictional file `lookup-table.dat` would be in the source code
  ;; repository, and the fictional procedure `process-data` would read
  ;; it and return a data structure.
  (define lookup-table (process-data "lookup-table.dat"))

```

This will not work.  Compiling a Scheme file containing such a form
will produce a program that calls `process-data` at run-time and not
at compile-time as intended.

One can of course resolve this with an explicit use of a procedural
macro.  In fact, all one needs to do is redefine `process-data` as a
macro, but I find this to be an unnecessary complication.

Further, any sufficiently intelligent implementation *will* actually
execute such top-level definitions at compile-time, given that they
only make use of compile-time constants and pure functions.

```scheme

  ;; Guile will compute the value at compile-time.
  (define seconds-per-day (number->string (* 24 60 60)))

```

This is easily observed by running `guild compile test.scm -o test.go`
and then `strings test.go` which will contain the string 86400 in its
output.  (If we didn't use `number->string`, it would be harder to
locate the number 86400 in the output, since it would be binary.)

This works because Guile implements the "partial evaluation" strategy
for program optimization.  This requires the optimizer to know which
procedures are pure.  A limitation in the implementation may lead to
some such opportunities to be missed, and for the compiled program to
execute unnecessary code at run-time.

To recap: The top-level of a Scheme file is conceptually executed at
run-time.  But an optimizer may execute some of it at compile-time
anyway.  However, we're at the mercy of the implementation quality for
this to happen consistently.  We can use procedural macros to force
some execution to definitely happen at compile-time.  What a mess!

It would be so much simpler if compiling a program meant, at the
language semantics level, that the top-level is executed.

Any run-time initialization of a program or module should be explicit,
such as by putting it into a `main` function, or having the module
export an initialization function.  (The language may, if I feel like
it, allow for declaring a module initializer function, which would be
invoked automatically when a module is loaded.)


## Everything can be serialized

If you're familiar with Guile --and I suspect most implementations of
Scheme have a similar limitation-- then you may have noticed an issue
in the above example where the programmer intends to compute a lookup
table at compile-time.

Not all Scheme objects can be serialized.  This not only applies to
the `write` procedure, but also the compiler's ability to put objects
into the binary representation of a compiled program.  (For example,
the .data section of an ELF file in case of Guile.)

This can be demonstrated as follows:

```scheme

  (define-syntax process-data
    (lambda (stx)
      (syntax-case stx ()
        ((_ file)
         ;; Ignore `file`; this is just an example!
         (let ((ht (make-eqv-hashtable)))
           (hashtable-set! ht 1 2)
           ht)))))

  (define lookup-table (process-data "lookup-table.dat"))

```

Compiling this with `guild` will yield an error, complaining about an
"unhandled constant" represented as #<r6rs:hashtable ...> in the error
message.  What it's actually trying to say is that hash tables aren't
constants, and the compiler doesn't know how to put them into the ELF
file it's writing.

(At least, this is the case as of February 2025, using Guile 3.0.10;
who knows what the future will provide!)

In Zisp, I want absolutely everything to be possible to serialize, and
the compiler should simply be using this capability of the language to
write out compiled binaries.

For example, given that any Zisp program has to declare a `main` entry
point function, all the compiler would do is execute the file and then
call `(write main)`.  How elegant!

This serialization of a function would, of course, involve traversing
all references therein, and including them in the output somehow.  The
same will apply to writing out any data structure.  This means that
serializing a module is *not* a matter of invoking `write` on each of
its exported definitions.  This would lead to lots of duplicate data
between the outputs, and `eq?` relations would be lost after reading
them back in.  We probably want a first-class `module` type that can
be serialized as one object.

(If we want symmetry between `read` and `write`, then `read` needs to
detect that it's looking at a compiled binary.  Not entirely sure yet
if I really want this, but I think I do.)


## Symbols are strings are symbols

In Scheme, symbols are literally just interned and immutable strings.
They can contain any character a string can, constructed either via
`string->symbol` or the modern `|foo bar baz|' syntax for quoted
symbols.  Why not just embrace the fact that they are strings?

Scheme strings are mutable, but they are a terrible choice for text
manipulation, because they are constant-length.  They are literally
just vectors of characters.  If you wanted a vector of characters,
well, use a vector of characters!

Zisp won't differentiate between symbols and strings.  All strings
will be immutable, string constants will be automatically interned,
and bare symbols will just be reader syntax for a string constant.

Instead of `string->symbol` we will have `string-intern` which
basically does the same thing.  Dynamically generated strings that
aren't passed to this function will be uninterned.


## Stop the "cons" madness!

Lists are neat, but they aren't the best representation for sequences
of fixed length.  An array/vector is a better choice for this.

R6RS already uses vectors in some places where a traditional lisper
may have expected to see lists.  Namely, in the procedural layer of
record types, where the fields of a record type are represented by
vectors.  There may be more places in Scheme where this makes sense.


## Better handling of rest args

Initially, I was thinking of using either immutable vectors, or
immutable lists transparently backed by arrays, to represent rest
arguments.  But the following paper offers a very compelling
alternative:

  https://legacy.cs.indiana.edu/~dyb/pubs/LaSC-3-3-pp229-244.pdf

Long story short, rest argumenst are received through a parameter list
such as `(arg1 ... argn & rest)` and the identifier `rest` is special
in that it can only be passed on to another procedure using the same
syntax.  For example, to explicitly put the rest args into a list and
map over them:

```scheme

  (define (map* proc & args)
    (map proc (list & args)))

  (map* square 1 2 3) ;=> (1 4 9)

```

Recursive functions that directly consume an arbitrary number of args,
without needing to allocate any data structure, can be implemented by
combining this feature with what is today known as `case-lambda`:

```scheme

  (define combine*
    (case-lambda
      ((x) x)
      ((x y & rest) (combine* (combine x y) & rest))

```

Though the paper proposes the use of `&` so as to differentiate it
from the regular rest-argument mechanism of Scheme, I intend to make
Zisp use only this mechanism, so we can use the dot notation for it.
Rewriting the above examples in this style gives us:

```scheme

  (define (map* proc . args)
    (map proc (list . args)))

  (define combine*
    (case-lambda
      ((x) x)
      ((x y . rest) (combine* (combine x y) . rest))))

```

I find this very pleasing on the eyes, and a very elegant way to use
improper lists in evaluation context, which isn't allowed in Scheme.


## More ergonomic multiple-values

The paper linked above proposes to reuse the rest args syntax for an
elegant solution to consuming multiple values:

```scheme

  (proc x y & <expr>)

```

In the above, `<expr>` may evaluate to multiple values, and the values
will be passed to `proc` as additional arguments.

Essentially, this means that the special rest-arg identifier is itself
a representation of multiple values, or in other words, evaluating it
results in multiple values even though it's just an identifier!

Demonstration, using Zisp notation for multiple-value rest args:

```scheme

  (define (foobar . rest)
    (let-values (((x y z) rest))
      ;; This is a meaningless example for demonstration, since we
      ;; could have just made the function accept three parameters.
      ;; The `let-values` here is the one from SRFI 11.
      ))

```

The paper also proposes terminating lambda bodies with `& <expr>` to
return multiple values, but this is obsolete as of R5RS, which allows
the final expression in a body to evaluate to multiple values anyway.

However, the use of & to pass multiple values as arguments is very
convenient and way clearer than R5RS's clumsy `call-with-values`:

```scheme

  ;; Returns {bool, obj} where bool indicates success/failure and obj
  ;; is meaningless if bool is false; allows differentiating between
  ;; the case where #f is found as the value vs. nothing being found.
  (define (lookup alist key)
    (if (null? alist)
        (values #f #f)
        (if (eqv? key (caar alist))
            (values #t (cdar alist))
            (lookup (cdr alist) key))))

  (define (display-if-found found? obj)
    (when found? (display obj)))

  ;; Incredibly ugly `call-with-values`:
  (call-with-values
    (lambda () (lookup '((x . y)) 'x))
    display-if-found)

  ;; (Up until here is valid R5RS code, by the way.)

  ;; So much cleaner:
  (display-if-found & (lookup '((x . y)) 'x))  ;; displays x

```

Unfortunately, we can't reuse the improper list syntax in the last
example, since the following s-expressions are equivalent:

```scheme

  (foo . (bar baz))
  (foo bar baz)

```

In Zisp, this will be solved by making `apply` a special-form where
the last operand is expected to evaluate to multiple values rather
than a list:

```scheme

  ;; (apply <proc-expr> <argn-expr> ... <restargs-expr>)

  (apply display-if-found (lookup '((x . y)) 'x))

```

Note that this means the forms `(apply foo rest)` and `(foo . rest)`
are equivalent if `rest` is an identifier and not a pair/list, while
`(apply foo (x ...))` is of course NOT equivalent to `(foo x ...)`.


## A little bit of syntax sugar never hurt anyone

Why not make `foo.bar` reader syntax for `(record-ref foo bar)` where
`record-ref` is a macro that interprets `bar` as a constant?

Admittedly, this loses us some elegance compared to the widely used
SRFI 9 (standardized in R7RS-small) where field getters only exist as
procedures, so making a field "private" is simply a matter of leaving
the getter out of your module exports.

A general `record-ref` that accepts field names would need to be aware
of the concept of private fields and block access to them except if
invoked in a context where the private fields are accessible.

I think the nice syntax is worth the added complexity.  It doesn't
seem terribly difficult to expand the concept of "lexical scope" to
include field accessibility information.

Alternatively, we could just not have private fields, like in Python.
Fields that are meant to be private could be named in a special way by
convention, such as `%foo` or whatever.  I have to check whether real
encapsulation would provide us with any substantial benefits, such as
in optimization.

Furthermore, `foo[bar]` could be sugar for `(vector-ref foo bar)` and
`foo{bar}` could be sugar for `(map-ref foo bar)` or the like.  (Not
sure yet whether to use the word "map" for a data type, due to the
overlap with the `map` function.)

The functionality of SRFI 17 should be a core aspect of the language,
so the following all work:

```scheme

  ;; Record
  (define rec (make-foo))
  (set! rec.field value)

  ;; Vector
  (define vec (vector 1 2 3))
  (set! vec[n] value)

  ;; Some kind of map
  (define ht (hash-table))
  (set! ht{key} value)

```


## More immutability

I see no reason to have mutable variables in the language.

Usually, code is analyzed to distinguish between mutable and immutable
variables, because this aids in optimization.  This means you end up
with two types of variables, and whether a variable is of one or the
other type is determined solely from how it's used.  Ugly!

An explicit box data structure can trivially replicate the features of
a mutable variable, so let's just use that instead.

Our `set!` can assume a box when a plain identifier is used.  But to
get the value, we call `get`.

```scheme

  (let ((x (box 0)))
    (while foo
      (set! x (+ x 1)))
    (get x))

```

I've not yet made up my mind on whether pairs should be immutable by
default, but they probably should.

Strings, as already mentioned before, will be immutable, since string
constants will be the same thing as a symbol.


## No shadowing (shock!) and a reduced set of `let` forms

This may be shocking for Schemers but I believe shadowing variables is
evil.  I've already written a bug once that would have been prevented
had it not been possible to shadow variables, and I don't even write
that much Scheme code.

And let's face it: The presence of four different forms by the name of
`let`, `let*`, `letrec`, and `letrec*` is daunting when you're coming
from another language.  Lack of shadowing allows us to reduce this
without losing out much functionality.

In the absence of shadowing, `let` becomes nearly useless because you
can always use `letrec` to fulfill the same need; it's strictly more
powerful.  (The only thing `let` can do that `letrec` can't, is to
refer to the previous binding of a variable before shadowing it.)

Further, the value of vanilla `letrec` is dubious when `letrec*` is
strictly more powerful.  So, in Zisp, `let` is the ultimate binding
form that does what `letrec*` does in Scheme.

Except it does more!  We haven't looked at the whole `let-values`
family yet.  In Zisp, these are also merged into the ultimate `let`,
using the SRFI 71 syntax:

```scheme

  (let ((a (one-value))
        (b c (two-values))
        ((values d e . rest) (arbitrary-values)))
    (do-things))

```

You may be wondering whether it also supports the "let loop" syntax of
vanilla Scheme `let` and the answer is no, because I hate that syntax.
It has too high a risk of leading to absolute spaghetti code with no
clear indication as to how the loop variables are being updated.

If you want to loop, use a dedicated looping syntax!  Even `do` is
better than "let loop" shenanigans if you ask me.


## Return zero values when there's nothing to return

This is only a minor point: It's a long-running pet peeve of mine that
Scheme specifies "an unspecified value" to be returned when there's
nothing meaningful to return.  It's a remnant from before we had the
ability to return multiple values, and should be eliminated.

Any operation that has nothing meaningful to return, will return zero
values in Zisp.


## Strict mode to disallow ignoring returned values

This ties in to the last point.  In Scheme, a non-tail expression in a
body can return an arbitrary number of values, which will be silently
ignored.

This can lead to bugs, where a procedure actually returns some kind of
success or failure indicator (instead of raising an error) and the
programmer forgets to handle it.

Though it may be too inefficient to enable globally, there should at
least be a mode of compilation that emits code which checks at every
single function return whether there are any values that are being
ignored, and raises an error if so.

There would of course be a form to explicitly ignore values.


## Only Booleans have truthiness

Like in Java, there should be no implicit conversion of values to a
Boolean.  This leads to sloppy code and subtle bugs.

I believe the code base of Guix contained an example of this at some
point: Build phases ending in a call to `system*` would return 0 or 1
which would pass as "true" regardless.

In most cases, you should know the actual type of the value you're
receiving, and do an appropriate check, be it `zero?`, `null?`, or
some other check.  If you truly want to check if something is "any
value other than false" then you can always do:

```scheme

  (if (not (eq? #f value))
      (do something))

```

No, you cannot use `(not (not x))` because `not` obviously expects a
Boolean argument!  Duh.

I'm actually serious about this.  Scheme went all the way to make null
separate from false, but then refused to go all the way and decided to
allow non-Boolean values to function as Booleans anyway.

Of course, performing a type-check on every single conditional may
incur a serious performance penalty.  If so, then the same flag that
determines whether returned values can be ignored may also determine
whether non-Booleans can be coerced into Booleans.


## Subtyping of record types

It's a serious limitation of SRFI 9 that it doesn't allow creating
subtypes with additional fields.  This is an invaluable strategy for
representing a hierarchy of types, which are ubiquitious in real life
and thus in programming.

Sadly, this brings with it some significant complications if records
are to be initialized with default values to ensure invariants.  The
R6RS solves this with an incredibly sophisticated system, which we
might need to adopt.  (Search for "protocol" and "record-constructor
descriptor" in the R6RS.)

However, we may be able to get away with a simpler approach...


## OOP is not that bad

The immense complexity of orchestrating the correct initialization of
a record type can be overcome by