diff options
| -rw-r--r-- | .gitignore | 5 | ||||
| -rw-r--r-- | README.md | 576 | ||||
| -rw-r--r-- | build.zig | 116 | ||||
| -rw-r--r-- | build.zig.zon | 73 | ||||
| -rwxr-xr-x | html/gen.sh | 25 | ||||
| -rw-r--r-- | html/index.md | 24 | ||||
| -rw-r--r-- | html/notes/booleans.md | 32 | ||||
| -rw-r--r-- | html/notes/compilation.md | 115 | ||||
| -rw-r--r-- | html/notes/cons.md | 178 | ||||
| -rw-r--r-- | html/notes/equal.md | 255 | ||||
| -rw-r--r-- | html/notes/immutable.md | 56 | ||||
| -rw-r--r-- | html/notes/let.md | 41 | ||||
| -rw-r--r-- | html/notes/nan.md | 384 | ||||
| -rw-r--r-- | html/notes/oop.md | 3 | ||||
| -rw-r--r-- | html/notes/records.md | 54 | ||||
| -rw-r--r-- | html/notes/serialize.md | 67 | ||||
| -rw-r--r-- | html/notes/strict-mode.md | 16 | ||||
| -rw-r--r-- | html/notes/sugar.md | 85 | ||||
| -rw-r--r-- | html/notes/symbols.md | 19 | ||||
| -rw-r--r-- | html/notes/zero-values.md | 11 | ||||
| -rw-r--r-- | html/prelude.html | 13 | ||||
| -rw-r--r-- | html/style.css | 23 | ||||
| -rw-r--r-- | src/main.zig | 73 | ||||
| -rw-r--r-- | src/root.zig | 191 | ||||
| -rw-r--r-- | src/test.c | 119 | ||||
| -rw-r--r-- | test.scm | 89 |
26 files changed, 2067 insertions, 576 deletions
diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..fd81fa2 --- /dev/null +++ b/.gitignore @@ -0,0 +1,5 @@ +/html/highlightjs +/html/index.html +/html/notes/*.html +.zig-cache +zig-out
\ No newline at end of file diff --git a/README.md b/README.md deleted file mode 100644 index 7a2146c..0000000 --- a/README.md +++ /dev/null @@ -1,576 +0,0 @@ -# Zisp: 21st-century Scheme-inspired language - -Zisp is my experimental toy language inspired by Scheme. The idea is -that it's a modern "re-imagining" of what Scheme may have been had it -been invented today, and had it been designed with pragmatic use as a -primary concern in its design. - -This language doesn't actually exist yet. You are merely reading the -ramblings of a madman. - - -## Compilation is execution - -Any Scheme implementation with support for procedural macros allows -arbitrary code execution at compile-time. However, this is slightly -awkward: - -```scheme - - (define-syntax comptime - (lambda (stx) - (syntax-case stx () - ((_) - (begin - (display "foo\n"))))) - - (comptime) - -``` - -Compiling this with, for example, `guild compile foo.scm` will lead to -the line "foo" being printed. (The actual program is a no-op and will -do nothing when run after compiled.) - -One can of course implement a macro such as `eval-when-compile` to -make this slightly less awkward. Using R6RS: - -```scheme - - (import (rnrs eval)) - - (define-syntax eval-when-compile - (lambda (stx) - (syntax-case stx () - ((_ imports body ...) - (eval - (syntax->datum #'(begin body ...)) - (apply environment (syntax->datum #'imports))))))) - - (eval-when-compile - ((rnrs)) - (display "foo\n")) - -``` - -An implementation may of course contain such a macro in its standard -library, but it's unclear why the language should put such a hurdle in -our way. There are problems beyond this little hurdle as well. - -Top-level forms in Scheme are semantically executed at run-time, not -compile-time. - -(Actually, the Scheme standards don't explicitly define a run-time or -compile-time stage, but it's arguably implicit in the fact that macros -are *not* first-class, and are defined by the language in such a way -that they can be executed entirely at compile-time if ahead-of-time -compilation is supported by an implementation.) - -Consider the case where the programmer wants to perform a relatively -costly calculation at compile-time and store the result as part of the -compiled program. Say, a lookup-table. Naively, we may attempt the -following: - -```scheme - - ;; The fictional file `lookup-table.dat` would be in the source code - ;; repository, and the fictional procedure `process-data` would read - ;; it and return a data structure. - (define lookup-table (process-data "lookup-table.dat")) - -``` - -This will not work. Compiling a Scheme file containing such a form -will produce a program that calls `process-data` at run-time and not -at compile-time as intended. - -One can of course resolve this with an explicit use of a procedural -macro. In fact, all one needs to do is redefine `process-data` as a -macro, but I find this to be an unnecessary complication. - -Further, any sufficiently intelligent implementation *will* actually -execute such top-level definitions at compile-time, given that they -only make use of compile-time constants and pure functions. - -```scheme - - ;; Guile will compute the value at compile-time. - (define seconds-per-day (number->string (* 24 60 60))) - -``` - -This is easily observed by running `guild compile test.scm -o test.go` -and then `strings test.go` which will contain the string 86400 in its -output. (If we didn't use `number->string`, it would be harder to -locate the number 86400 in the output, since it would be binary.) - -This works because Guile implements the "partial evaluation" strategy -for program optimization. This requires the optimizer to know which -procedures are pure. A limitation in the implementation may lead to -some such opportunities to be missed, and for the compiled program to -execute unnecessary code at run-time. - -To recap: The top-level of a Scheme file is conceptually executed at -run-time. But an optimizer may execute some of it at compile-time -anyway. However, we're at the mercy of the implementation quality for -this to happen consistently. We can use procedural macros to force -some execution to definitely happen at compile-time. What a mess! - -It would be so much simpler if compiling a program meant, at the -language semantics level, that the top-level is executed. - -Any run-time initialization of a program or module should be explicit, -such as by putting it into a `main` function, or having the module -export an initialization function. (The language may, if I feel like -it, allow for declaring a module initializer function, which would be -invoked automatically when a module is loaded.) - - -## Everything can be serialized - -If you're familiar with Guile --and I suspect most implementations of -Scheme have a similar limitation-- then you may have noticed an issue -in the above example where the programmer intends to compute a lookup -table at compile-time. - -Not all Scheme objects can be serialized. This not only applies to -the `write` procedure, but also the compiler's ability to put objects -into the binary representation of a compiled program. (For example, -the .data section of an ELF file in case of Guile.) - -This can be demonstrated as follows: - -```scheme - - (define-syntax process-data - (lambda (stx) - (syntax-case stx () - ((_ file) - ;; Ignore `file`; this is just an example! - (let ((ht (make-eqv-hashtable))) - (hashtable-set! ht 1 2) - ht))))) - - (define lookup-table (process-data "lookup-table.dat")) - -``` - -Compiling this with `guild` will yield an error, complaining about an -"unhandled constant" represented as #<r6rs:hashtable ...> in the error -message. What it's actually trying to say is that hash tables aren't -constants, and the compiler doesn't know how to put them into the ELF -file it's writing. - -(At least, this is the case as of February 2025, using Guile 3.0.10; -who knows what the future will provide!) - -In Zisp, I want absolutely everything to be possible to serialize, and -the compiler should simply be using this capability of the language to -write out compiled binaries. - -For example, given that any Zisp program has to declare a `main` entry -point function, all the compiler would do is execute the file and then -call `(write main)`. How elegant! - -This serialization of a function would, of course, involve traversing -all references therein, and including them in the output somehow. The -same will apply to writing out any data structure. This means that -serializing a module is *not* a matter of invoking `write` on each of -its exported definitions. This would lead to lots of duplicate data -between the outputs, and `eq?` relations would be lost after reading -them back in. We probably want a first-class `module` type that can -be serialized as one object. - -(If we want symmetry between `read` and `write`, then `read` needs to -detect that it's looking at a compiled binary. Not entirely sure yet -if I really want this, but I think I do.) - - -## Symbols are strings are symbols - -In Scheme, symbols are literally just interned and immutable strings. -They can contain any character a string can, constructed either via -`string->symbol` or the modern `|foo bar baz|' syntax for quoted -symbols. Why not just embrace the fact that they are strings? - -Scheme strings are mutable, but they are a terrible choice for text -manipulation, because they are constant-length. They are literally -just vectors of characters. If you wanted a vector of characters, -well, use a vector of characters! - -Zisp won't differentiate between symbols and strings. All strings -will be immutable, string constants will be automatically interned, -and bare symbols will just be reader syntax for a string constant. - -Instead of `string->symbol` we will have `string-intern` which -basically does the same thing. Dynamically generated strings that -aren't passed to this function will be uninterned. - - -## Stop the "cons" madness! - -Lists are neat, but they aren't the best representation for sequences -of fixed length. An array/vector is a better choice for this. - -R6RS already uses vectors in some places where a traditional lisper -may have expected to see lists. Namely, in the procedural layer of -record types, where the fields of a record type are represented by -vectors. There may be more places in Scheme where this makes sense. - - -## Better handling of rest args - -Initially, I was thinking of using either immutable vectors, or -immutable lists transparently backed by arrays, to represent rest -arguments. But the following paper offers a very compelling -alternative: - - https://legacy.cs.indiana.edu/~dyb/pubs/LaSC-3-3-pp229-244.pdf - -Long story short, rest argumenst are received through a parameter list -such as `(arg1 ... argn & rest)` and the identifier `rest` is special -in that it can only be passed on to another procedure using the same -syntax. For example, to explicitly put the rest args into a list and -map over them: - -```scheme - - (define (map* proc & args) - (map proc (list & args))) - - (map* square 1 2 3) ;=> (1 4 9) - -``` - -Recursive functions that directly consume an arbitrary number of args, -without needing to allocate any data structure, can be implemented by -combining this feature with what is today known as `case-lambda`: - -```scheme - - (define combine* - (case-lambda - ((x) x) - ((x y & rest) (combine* (combine x y) & rest)) - -``` - -Though the paper proposes the use of `&` so as to differentiate it -from the regular rest-argument mechanism of Scheme, I intend to make -Zisp use only this mechanism, so we can use the dot notation for it. -Rewriting the above examples in this style gives us: - -```scheme - - (define (map* proc . args) - (map proc (list . args))) - - (define combine* - (case-lambda - ((x) x) - ((x y . rest) (combine* (combine x y) . rest)))) - -``` - -I find this very pleasing on the eyes, and a very elegant way to use -improper lists in evaluation context, which isn't allowed in Scheme. - - -## More ergonomic multiple-values - -The paper linked above proposes to reuse the rest args syntax for an -elegant solution to consuming multiple values: - -```scheme - - (proc x y & <expr>) - -``` - -In the above, `<expr>` may evaluate to multiple values, and the values -will be passed to `proc` as additional arguments. - -Essentially, this means that the special rest-arg identifier is itself -a representation of multiple values, or in other words, evaluating it -results in multiple values even though it's just an identifier! - -Demonstration, using Zisp notation for multiple-value rest args: - -```scheme - - (define (foobar . rest) - (let-values (((x y z) rest)) - ;; This is a meaningless example for demonstration, since we - ;; could have just made the function accept three parameters. - ;; The `let-values` here is the one from SRFI 11. - )) - -``` - -The paper also proposes terminating lambda bodies with `& <expr>` to -return multiple values, but this is obsolete as of R5RS, which allows -the final expression in a body to evaluate to multiple values anyway. - -However, the use of & to pass multiple values as arguments is very -convenient and way clearer than R5RS's clumsy `call-with-values`: - -```scheme - - ;; Returns {bool, obj} where bool indicates success/failure and obj - ;; is meaningless if bool is false; allows differentiating between - ;; the case where #f is found as the value vs. nothing being found. - (define (lookup alist key) - (if (null? alist) - (values #f #f) - (if (eqv? key (caar alist)) - (values #t (cdar alist)) - (lookup (cdr alist) key)))) - - (define (display-if-found found? obj) - (when found? (display obj))) - - ;; Incredibly ugly `call-with-values`: - (call-with-values - (lambda () (lookup '((x . y)) 'x)) - display-if-found) - - ;; (Up until here is valid R5RS code, by the way.) - - ;; So much cleaner: - (display-if-found & (lookup '((x . y)) 'x)) ;; displays x - -``` - -Unfortunately, we can't reuse the improper list syntax in the last -example, since the following s-expressions are equivalent: - -```scheme - - (foo . (bar baz)) - (foo bar baz) - -``` - -In Zisp, this will be solved by making `apply` a special-form where -the last operand is expected to evaluate to multiple values rather -than a list: - -```scheme - - ;; (apply <proc-expr> <argn-expr> ... <restargs-expr>) - - (apply display-if-found (lookup '((x . y)) 'x)) - -``` - -Note that this means the forms `(apply foo rest)` and `(foo . rest)` -are equivalent if `rest` is an identifier and not a pair/list, while -`(apply foo (x ...))` is of course NOT equivalent to `(foo x ...)`. - - -## A little bit of syntax sugar never hurt anyone - -Why not make `foo.bar` reader syntax for `(record-ref foo bar)` where -`record-ref` is a macro that interprets `bar` as a constant? - -Admittedly, this loses us some elegance compared to the widely used -SRFI 9 (standardized in R7RS-small) where field getters only exist as -procedures, so making a field "private" is simply a matter of leaving -the getter out of your module exports. - -A general `record-ref` that accepts field names would need to be aware -of the concept of private fields and block access to them except if -invoked in a context where the private fields are accessible. - -I think the nice syntax is worth the added complexity. It doesn't -seem terribly difficult to expand the concept of "lexical scope" to -include field accessibility information. - -Alternatively, we could just not have private fields, like in Python. -Fields that are meant to be private could be named in a special way by -convention, such as `%foo` or whatever. I have to check whether real -encapsulation would provide us with any substantial benefits, such as -in optimization. - -Furthermore, `foo[bar]` could be sugar for `(vector-ref foo bar)` and -`foo{bar}` could be sugar for `(map-ref foo bar)` or the like. (Not -sure yet whether to use the word "map" for a data type, due to the -overlap with the `map` function.) - -The functionality of SRFI 17 should be a core aspect of the language, -so the following all work: - -```scheme - - ;; Record - (define rec (make-foo)) - (set! rec.field value) - - ;; Vector - (define vec (vector 1 2 3)) - (set! vec[n] value) - - ;; Some kind of map - (define ht (hash-table)) - (set! ht{key} value) - -``` - - -## More immutability - -I see no reason to have mutable variables in the language. - -Usually, code is analyzed to distinguish between mutable and immutable -variables, because this aids in optimization. This means you end up -with two types of variables, and whether a variable is of one or the -other type is determined solely from how it's used. Ugly! - -An explicit box data structure can trivially replicate the features of -a mutable variable, so let's just use that instead. - -Our `set!` can assume a box when a plain identifier is used. But to -get the value, we call `get`. - -```scheme - - (let ((x (box 0))) - (while foo - (set! x (+ x 1))) - (get x)) - -``` - -I've not yet made up my mind on whether pairs should be immutable by -default, but they probably should. - -Strings, as already mentioned before, will be immutable, since string -constants will be the same thing as a symbol. - - -## No shadowing (shock!) and a reduced set of `let` forms - -This may be shocking for Schemers but I believe shadowing variables is -evil. I've already written a bug once that would have been prevented -had it not been possible to shadow variables, and I don't even write -that much Scheme code. - -And let's face it: The presence of four different forms by the name of -`let`, `let*`, `letrec`, and `letrec*` is daunting when you're coming -from another language. Lack of shadowing allows us to reduce this -without losing out much functionality. - -In the absence of shadowing, `let` becomes nearly useless because you -can always use `letrec` to fulfill the same need; it's strictly more -powerful. (The only thing `let` can do that `letrec` can't, is to -refer to the previous binding of a variable before shadowing it.) - -Further, the value of vanilla `letrec` is dubious when `letrec*` is -strictly more powerful. So, in Zisp, `let` is the ultimate binding -form that does what `letrec*` does in Scheme. - -Except it does more! We haven't looked at the whole `let-values` -family yet. In Zisp, these are also merged into the ultimate `let`, -using the SRFI 71 syntax: - -```scheme - - (let ((a (one-value)) - (b c (two-values)) - ((values d e . rest) (arbitrary-values))) - (do-things)) - -``` - -You may be wondering whether it also supports the "let loop" syntax of -vanilla Scheme `let` and the answer is no, because I hate that syntax. -It has too high a risk of leading to absolute spaghetti code with no -clear indication as to how the loop variables are being updated. - -If you want to loop, use a dedicated looping syntax! Even `do` is -better than "let loop" shenanigans if you ask me. - - -## Return zero values when there's nothing to return - -This is only a minor point: It's a long-running pet peeve of mine that -Scheme specifies "an unspecified value" to be returned when there's -nothing meaningful to return. It's a remnant from before we had the -ability to return multiple values, and should be eliminated. - -Any operation that has nothing meaningful to return, will return zero -values in Zisp. - - -## Strict mode to disallow ignoring returned values - -This ties in to the last point. In Scheme, a non-tail expression in a -body can return an arbitrary number of values, which will be silently -ignored. - -This can lead to bugs, where a procedure actually returns some kind of -success or failure indicator (instead of raising an error) and the -programmer forgets to handle it. - -Though it may be too inefficient to enable globally, there should at -least be a mode of compilation that emits code which checks at every -single function return whether there are any values that are being -ignored, and raises an error if so. - -There would of course be a form to explicitly ignore values. - - -## Only Booleans have truthiness - -Like in Java, there should be no implicit conversion of values to a -Boolean. This leads to sloppy code and subtle bugs. - -I believe the code base of Guix contained an example of this at some -point: Build phases ending in a call to `system*` would return 0 or 1 -which would pass as "true" regardless. - -In most cases, you should know the actual type of the value you're -receiving, and do an appropriate check, be it `zero?`, `null?`, or -some other check. If you truly want to check if something is "any -value other than false" then you can always do: - -```scheme - - (if (not (eq? #f value)) - (do something)) - -``` - -No, you cannot use `(not (not x))` because `not` obviously expects a -Boolean argument! Duh. - -I'm actually serious about this. Scheme went all the way to make null -separate from false, but then refused to go all the way and decided to -allow non-Boolean values to function as Booleans anyway. - -Of course, performing a type-check on every single conditional may -incur a serious performance penalty. If so, then the same flag that -determines whether returned values can be ignored may also determine -whether non-Booleans can be coerced into Booleans. - - -## Subtyping of record types - -It's a serious limitation of SRFI 9 that it doesn't allow creating -subtypes with additional fields. This is an invaluable strategy for -representing a hierarchy of types, which are ubiquitious in real life -and thus in programming. - -Sadly, this brings with it some significant complications if records -are to be initialized with default values to ensure invariants. The -R6RS solves this with an incredibly sophisticated system, which we -might need to adopt. (Search for "protocol" and "record-constructor -descriptor" in the R6RS.) - -However, we may be able to get away with a simpler approach... - - -## OOP is not that bad - -The immense complexity of orchestrating the correct initialization of -a record type can be overcome by diff --git a/build.zig b/build.zig new file mode 100644 index 0000000..870a6ad --- /dev/null +++ b/build.zig @@ -0,0 +1,116 @@ +const std = @import("std"); + +// Although this function looks imperative, note that its job is to +// declaratively construct a build graph that will be executed by an external +// runner. +pub fn build(b: *std.Build) void { + // Standard target options allows the person running `zig build` to choose + // what target to build for. Here we do not override the defaults, which + // means any target is allowed, and the default is native. Other options + // for restricting supported target set are available. + const target = b.standardTargetOptions(.{}); + + // Standard optimization options allow the person running `zig build` to select + // between Debug, ReleaseSafe, ReleaseFast, and ReleaseSmall. Here we do not + // set a preferred release mode, allowing the user to decide how to optimize. + const optimize = b.standardOptimizeOption(.{}); + + // This creates a "module", which represents a collection of source files alongside + // some compilation options, such as optimization mode and linked system libraries. + // Every executable or library we compile will be based on one or more modules. + const lib_mod = b.createModule(.{ + // `root_source_file` is the Zig "entry point" of the module. If a module + // only contains e.g. external object files, you can make this `null`. + // In this case the main source file is merely a path, however, in more + // complicated build scripts, this could be a generated file. + .root_source_file = b.path("src/root.zig"), + .target = target, + .optimize = optimize, + }); + + // We will also create a module for our other entry point, 'main.zig'. + const exe_mod = b.createModule(.{ + // `root_source_file` is the Zig "entry point" of the module. If a module + // only contains e.g. external object files, you can make this `null`. + // In this case the main source file is merely a path, however, in more + // complicated build scripts, this could be a generated file. + .root_source_file = b.path("src/main.zig"), + .target = target, + .optimize = optimize, + }); + + // Modules can depend on one another using the `std.Build.Module.addImport` function. + // This is what allows Zig source code to use `@import("foo")` where 'foo' is not a + // file path. In this case, we set up `exe_mod` to import `lib_mod`. + exe_mod.addImport("zisp_lib", lib_mod); + + // Now, we will create a static library based on the module we created above. + // This creates a `std.Build.Step.Compile`, which is the build step responsible + // for actually invoking the compiler. + const lib = b.addLibrary(.{ + .linkage = .static, + .name = "zisp", + .root_module = lib_mod, + }); + + // This declares intent for the library to be installed into the standard + // location when the user invokes the "install" step (the default step when + // running `zig build`). + b.installArtifact(lib); + + // This creates another `std.Build.Step.Compile`, but this one builds an executable + // rather than a static library. + const exe = b.addExecutable(.{ + .name = "zisp", + .root_module = exe_mod, + }); + + // This declares intent for the executable to be installed into the + // standard location when the user invokes the "install" step (the default + // step when running `zig build`). + b.installArtifact(exe); + + // This *creates* a Run step in the build graph, to be executed when another + // step is evaluated that depends on it. The next line below will establish + // such a dependency. + const run_cmd = b.addRunArtifact(exe); + + // By making the run step depend on the install step, it will be run from the + // installation directory rather than directly from within the cache directory. + // This is not necessary, however, if the application depends on other installed + // files, this ensures they will be present and in the expected location. + run_cmd.step.dependOn(b.getInstallStep()); + + // This allows the user to pass arguments to the application in the build + // command itself, like this: `zig build run -- arg1 arg2 etc` + if (b.args) |args| { + run_cmd.addArgs(args); + } + + // This creates a build step. It will be visible in the `zig build --help` menu, + // and can be selected like this: `zig build run` + // This will evaluate the `run` step rather than the default, which is "install". + const run_step = b.step("run", "Run the app"); + run_step.dependOn(&run_cmd.step); + + // Creates a step for unit testing. This only builds the test executable + // but does not run it. + const lib_unit_tests = b.addTest(.{ + .root_module = lib_mod, + }); + + const run_lib_unit_tests = b.addRunArtifact(lib_unit_tests); + + const exe_unit_tests = b.addTest(.{ + .root_module = exe_mod, + }); + + const run_exe_unit_tests = b.addRunArtifact(exe_unit_tests); + + // Similar to creating the run step earlier, this exposes a `test` step to + // the `zig build --help` menu, providing a way for the user to request + // running the unit tests. + const test_step = b.step("test", "Run unit tests"); + test_step.dependOn(&run_lib_unit_tests.step); + test_step.dependOn(&run_exe_unit_tests.step); +} diff --git a/build.zig.zon b/build.zig.zon new file mode 100644 index 0000000..8fcdf25 --- /dev/null +++ b/build.zig.zon @@ -0,0 +1,73 @@ +.{ + // This is the default name used by packages depending on this one. For + // example, when a user runs `zig fetch --save <url>`, this field is used + // as the key in the `dependencies` table. Although the user can choose a + // different name, most users will stick with this provided value. + // + // It is redundant to include "zig" in this name because it is already + // within the Zig package namespace. + .name = "zisp", + + // This is a [Semantic Version](https://semver.org/). + // In a future version of Zig it will be used for package deduplication. + .version = "0.0.0", + + // This field is optional. + // This is currently advisory only; Zig does not yet do anything + // with this value. + //.minimum_zig_version = "0.11.0", + + // This field is optional. + // Each dependency must either provide a `url` and `hash`, or a `path`. + // `zig build --fetch` can be used to fetch all dependencies of a package, recursively. + // Once all dependencies are fetched, `zig build` no longer requires + // internet connectivity. + .dependencies = .{ + // See `zig fetch --save <url>` for a command-line interface for adding dependencies. + //.example = .{ + // // When updating this field to a new URL, be sure to delete the corresponding + // // `hash`, otherwise you are communicating that you expect to find the old hash at + // // the new URL. If the contents of a URL change this will result in a hash mismatch + // // which will prevent zig from using it. + // .url = "https://example.com/foo.tar.gz", + // + // // This is computed from the file contents of the directory of files that is + // // obtained after fetching `url` and applying the inclusion rules given by + // // `paths`. + // // + // // This field is the source of truth; packages do not come from a `url`; they + // // come from a `hash`. `url` is just one of many possible mirrors for how to + // // obtain a package matching this `hash`. + // // + // // Uses the [multihash](https://multiformats.io/multihash/) format. + // .hash = "...", + // + // // When this is provided, the package is found in a directory relative to the + // // build root. In this case the package's hash is irrelevant and therefore not + // // computed. This field and `url` are mutually exclusive. + // .path = "foo", + // + // // When this is set to `true`, a package is declared to be lazily + // // fetched. This makes the dependency only get fetched if it is + // // actually used. + // .lazy = false, + //}, + }, + + // Specifies the set of files and directories that are included in this package. + // Only files and directories listed here are included in the `hash` that + // is computed for this package. Only files listed here will remain on disk + // when using the zig package manager. As a rule of thumb, one should list + // files required for compilation plus any license(s). + // Paths are relative to the build root. Use the empty string (`""`) to refer to + // the build root itself. + // A directory listed here means that all files within, recursively, are included. + .paths = .{ + "build.zig", + "build.zig.zon", + "src", + // For example... + //"LICENSE", + //"README.md", + }, +} diff --git a/html/gen.sh b/html/gen.sh new file mode 100755 index 0000000..d14286e --- /dev/null +++ b/html/gen.sh @@ -0,0 +1,25 @@ +#!/bin/bash + +set -euo pipefail + +if [ $# -eq 0 ] +then + exec find . -name \*.md -exec "$0" {} + +fi + +for file +do + if ! [ -f "$file" ] + then + echo >&2 "File not found: $file" + continue + fi + echo "$file" + { + title=$(sed '/^# / { s/# //; q }' "$file") + sed "s/__TITLE__/$title/" prelude.html + echo "<body>" + markdown2 "$file" -x fenced-code-blocks,highlightjs-lang + echo "</body>" + } > "${file%.md}".html +done diff --git a/html/index.md b/html/index.md new file mode 100644 index 0000000..e369e56 --- /dev/null +++ b/html/index.md @@ -0,0 +1,24 @@ +# Zisp: 21st-century Scheme-inspired language + +Zisp is my experimental toy language inspired by Scheme. The idea is +that it's a modern "re-imagining" of what Scheme may have been had it +been invented today, and had it been designed with pragmatic use as a +primary concern in its design. + +This language doesn't actually exist yet. You are merely reading the +ramblings of a madman. + +* [Compilation is execution](notes/compilation.html) +* [Everything can be serialized](notes/serialize.html) +* [Symbols are strings](notes/symbols.html) +* [Stop the "cons" madness!](notes/cons.html) +* [A little bit of syntax sugar never hurt anyone](notes/sugar.html) +* [More immutability](notes/immutable.html) +* [No shadowing, and fewer `let` forms](notes/let.html) +* [Return zero values](notes/zero-values.html) +* [Strict mode: Can't ignore returned values](notes/strict-mode.html) +* [Only Booleans have truthiness](notes/booleans.html) +* [Record types](notes/records.html) +* [Object-oriented programming](notes/oop.html) +* [Equality and equivalence semantics](notes/equal.html) +* [NaN-packing](notes/nan.html) diff --git a/html/notes/booleans.md b/html/notes/booleans.md new file mode 100644 index 0000000..61b9d7e --- /dev/null +++ b/html/notes/booleans.md @@ -0,0 +1,32 @@ +# Only Booleans have truthiness + +Like in Java, there should be no implicit conversion of values to a +Boolean. This leads to sloppy code and subtle bugs. + +I believe the code base of Guix contained an example of this at some +point: Build phases ending in a call to `system*` would return 0 or 1 +which would pass as "true" regardless. + +In most cases, you should know the actual type of the value you're +receiving, and do an appropriate check, be it `zero?`, `null?`, or +some other check. If you truly want to check if something is "any +value other than false" then you can always do: + +```scheme + +(if (not (eq? #f value)) + (do something)) + +``` + +No, you cannot use `(not (not x))` because `not` obviously expects a +Boolean argument! Duh. + +I'm actually serious about this. Scheme went all the way to make null +separate from false, but then refused to go all the way and decided to +allow non-Boolean values to function as Booleans anyway. + +Of course, performing a type-check on every single conditional may +incur a serious performance penalty. If so, then the same flag that +determines [whether returned values can be ignored](strict-mode.html) +may also determine whether non-Booleans can be coerced into Booleans. diff --git a/html/notes/compilation.md b/html/notes/compilation.md new file mode 100644 index 0000000..4d5fc6d --- /dev/null +++ b/html/notes/compilation.md @@ -0,0 +1,115 @@ +# Compilation is execution + +Any Scheme implementation with support for procedural macros allows +arbitrary code execution at compile-time. However, this is slightly +awkward: + +```scheme + +(define-syntax comptime + (lambda (stx) + (syntax-case stx () + ((_) + (begin + (display "foo\n"))))) + +(comptime) + +``` + +Compiling this with, for example, `guild compile foo.scm` will lead to +the line "foo" being printed. (The actual program is a no-op and will +do nothing when run after compiled.) + +One can of course implement a macro such as `eval-when-compile` to +make this slightly less awkward. Using R6RS: + +```scheme + +(import (rnrs eval)) + +(define-syntax eval-when-compile + (lambda (stx) + (syntax-case stx () + ((_ imports body ...) + (eval + (syntax->datum #'(begin body ...)) + (apply environment (syntax->datum #'imports))))))) + +(eval-when-compile + ((rnrs)) + (display "foo\n")) + +``` + +An implementation may of course contain such a macro in its standard +library, but it's unclear why the language should put such a hurdle in +our way. There are problems beyond this little hurdle as well. + +Top-level forms in Scheme are semantically executed at run-time, not +compile-time. + +(Actually, the Scheme standards don't explicitly define a run-time or +compile-time stage, but it's arguably implicit in the fact that macros +are *not* first-class, and are defined by the language in such a way +that they can be executed entirely at compile-time if ahead-of-time +compilation is supported by an implementation.) + +Consider the case where the programmer wants to perform a relatively +costly calculation at compile-time and store the result as part of the +compiled program. Say, a lookup-table. Naively, we may attempt the +following: + +```scheme + +;; The fictional file `lookup-table.dat` would be in the source code +;; repository, and the fictional procedure `process-data` would read +;; it and return a data structure. +(define lookup-table (process-data "lookup-table.dat")) + +``` + +This will not work. Compiling a Scheme file containing such a form +will produce a program that calls `process-data` at run-time and not +at compile-time as intended. + +One can of course resolve this with an explicit use of a procedural +macro. In fact, all one needs to do is redefine `process-data` as a +macro, but I find this to be an unnecessary complication. + +Further, any sufficiently intelligent implementation *will* actually +execute such top-level definitions at compile-time, given that they +only make use of compile-time constants and pure functions. + +```scheme + +;; Guile will compute the value at compile-time. +(define seconds-per-day (number->string (* 24 60 60))) + +``` + +This is easily observed by running `guild compile test.scm -o test.go` +and then `strings test.go` which will contain the string 86400 in its +output. (If we didn't use `number->string`, it would be harder to +locate the number 86400 in the output, since it would be binary.) + +This works because Guile implements the "partial evaluation" strategy +for program optimization. This requires the optimizer to know which +procedures are pure. A limitation in the implementation may lead to +some such opportunities to be missed, and for the compiled program to +execute unnecessary code at run-time. + +To recap: The top-level of a Scheme file is conceptually executed at +run-time. But an optimizer may execute some of it at compile-time +anyway. However, we're at the mercy of the implementation quality for +this to happen consistently. We can use procedural macros to force +some execution to definitely happen at compile-time. What a mess! + +It would be so much simpler if compiling a program meant, at the +language semantics level, that the top-level is executed. + +Any run-time initialization of a program or module should be explicit, +such as by putting it into a `main` function, or having the module +export an initialization function. (The language may, if I feel like +it, allow for declaring a module initializer function, which would be +invoked automatically when a module is loaded.) diff --git a/html/notes/cons.md b/html/notes/cons.md new file mode 100644 index 0000000..3f38519 --- /dev/null +++ b/html/notes/cons.md @@ -0,0 +1,178 @@ +# Stop the "cons" madness! + +Lists are neat, but they aren't the best representation for sequences +of fixed length. An array/vector is a better choice for this. + +R6RS already uses vectors in some places where a traditional lisper +may have expected to see lists. Namely, in the procedural layer of +record types, where the fields of a record type are represented by +vectors. Another example is `hashtable-keys` and `hashtable-entries` +which both return vectors. There may be more places in Scheme where +this makes sense. + +In the following, we discuss a better handling of rest-argument lists +and a change to `apply`. In short, rest arguments are not actually +lists anymore, but rather a special kind of identifier that refers to +multiple values. And `apply`, which becomes a special form, expects +its last argument not to be a list but rather an expression that may +evaluate to multiple values. + + +## Better handling of rest args + +Initially, I was thinking of using either immutable vectors, or +immutable lists transparently backed by arrays, to represent rest +arguments. But the following paper offers a very compelling +alternative: + +https://legacy.cs.indiana.edu/~dyb/pubs/LaSC-3-3-pp229-244.pdf + +Let's first summarize the paper, and then see how we can adapt its +ideas for Zisp. + +Long story short, rest argumenst are received through a parameter list +such as `(arg1 ... argn & rest)` and the identifier `rest` is special +in that it can only be passed on to another procedure using the same +syntax. For example, to explicitly put the rest args into a list and +map over them: + +```scheme + +(define (map* proc & args) + (map proc (list & args))) + +(map* square 1 2 3) ;=> (1 4 9) + +``` + +Recursive functions that directly consume an arbitrary number of args, +without needing to allocate any data structure, can be implemented by +combining this feature with what is today known as `case-lambda`: + +```scheme + +(define combine* + (case-lambda + ((x) x) + ((x y & rest) (combine* (combine x y) & rest)) + +``` + +Though the paper proposes the use of `&` so as to differentiate it +from the regular rest-argument mechanism of Scheme, I intend to make +Zisp use only this mechanism, so we can use the dot notation for it. +Rewriting the above examples in this style gives us: + +```scheme + +(define (map* proc . args) + (map proc (list . args))) + +(define combine* + (case-lambda + ((x) x) + ((x y . rest) (combine* (combine x y) . rest)))) + +``` + +I find this very pleasing on the eyes, and a very elegant way to use +improper lists in evaluation context, which isn't allowed in Scheme. + + +## More ergonomic multiple-values + +The paper linked above proposes to reuse the rest args syntax for an +elegant solution to consuming multiple values: + +```scheme + +(proc x y & <expr>) + +``` + +In the above, `<expr>` may evaluate to multiple values, and the values +will be passed to `proc` as additional arguments. + +Essentially, this means that the special rest-arg identifier is itself +a representation of multiple values, or in other words, evaluating it +results in multiple values even though it's just an identifier! + +Demonstration, using Zisp notation for multiple-value rest args: + +```scheme + +(define (foobar . rest) + (let-values (((x y z) rest)) + ;; This is a meaningless example for demonstration, since we + ;; could have just made the function accept three parameters. + ;; The `let-values` here is the one from SRFI 11. + )) + +``` + +The paper also proposes terminating lambda bodies with `& <expr>` to +return multiple values, but this is obsolete as of R5RS, which allows +the final expression in a body to evaluate to multiple values anyway. + +However, the use of & to pass multiple values as arguments is very +convenient and much cleaner than R5RS's clumsy `call-with-values`: + +```scheme + +;; Returns {bool, obj} where bool indicates success/failure and obj +;; is meaningless if bool is false; allows differentiating between +;; the case where #f is found as the value vs. nothing being found. +(define (lookup alist key) + (if (null? alist) + (values #f #f) + (if (eqv? key (caar alist)) + (values #t (cdar alist)) + (lookup (cdr alist) key)))) + +(define (display-if-found found? obj) + (when found? (display obj))) + +;; Incredibly ugly `call-with-values`: +(call-with-values + (lambda () (lookup '((x . y)) 'x)) + display-if-found) + +;; (Up until here is valid R5RS code, by the way.) + +;; So much cleaner: +(display-if-found & (lookup '((x . y)) 'x)) ;; displays x + +``` + +Unfortunately, we can't reuse the improper list syntax in the last +example, since the following s-expressions are equivalent: + +```scheme + +(foo . (bar baz)) +(foo bar baz) + +``` + +In Zisp, this will be solved by making `apply` a special-form where +the last operand is expected to evaluate to multiple values rather +than a list: + +```scheme + +;; (apply <proc-expr> <argn-expr> ... <restargs-expr>) + +(apply display-if-found (lookup '((x . y)) 'x)) + +``` + +Note that this means the forms `(apply foo rest)` and `(foo . rest)` +are equivalent if `rest` is an identifier and not a pair/list, while +`(apply foo (x ...))` is of course different from `(foo x ...)`. + +I find this all incredibly pleasing. Lists never had any business in +representing arguments in the first place; it should always have been +multiple values! + +(The phrase "argument list" is probably going to stick around forever +though, even if it's technically wrong in Zisp.) diff --git a/html/notes/equal.md b/html/notes/equal.md new file mode 100644 index 0000000..8c55faa --- /dev/null +++ b/html/notes/equal.md @@ -0,0 +1,255 @@ +# A novel approach to object equivalence + +## Story time + +In my past 5 years of developing a warehouse management application, +I've frequently found myself implementing equality the following way. + +(Code massively reduced to get to the point. Implementations of the +`hashCode` method, which must be compatible with `equals`, have been +left out for brevity.) + +```java + +class Warehouse { + String code; + String name; + // ... more fields, many mutable + + public boolean equals(Object obj) { + if (obj == this) { + return true; + } + if (obj instanceof Warehouse w) { + return code.equals(w.code); + } + return false; + } +} + +class Product { + int id; + LocalizedString name; + LocalizedString description; + // ... more fields, many mutable + + public boolean equals(Object obj) { + if (obj == this) { + return true; + } + if (obj instanceof Product p) { + return id == p.id; + } + return false; + } +} + +``` + +And so on. A type may have a code that is a String, an id that is an +int, or some other field that uniquely identifies members of it within +our application domain. + +I'm speaking of types and their members, not classes and instances, +because I mean the application domain. The user of our program has a +number of warehouses and a number of products, in reality, which we +represent via these classes, but there may be multiple instances of a +class representing the same real entity. + +Oftentimes, there will be a database that contains unique entries for +these. For example, an SQL table for products where the id is the +primary key, and an SQL table for warehouses where the code is the +primary key. + +This way of implementing `equals()` and `hashCode()` may seem wrong to +some developers. Since the classes have mutable fields, we may end up +with two `Product` instances which are not equal in their state, yet +`equals()` returns true for them since they represent the same real +life entity, just with different state in our program. Perhaps one +has just been fetched from the database, whereas the other has been +modified to represent changes that are yet to be committed. + +I've never found this strategy to lead to any problems. Never have I +had a need to know whether two `Product` instances have the same state +right now. What I care about is which product is being represented. +This is useful in various ways. + +### Map keys, sets, and so on + +First of all, it allows us to create maps where the keys are instances +of Product or Warehouse. For example, the application may want to +create a map of which products each warehouse contains: + +```java + +Map<Warehouse, List<Product>> productsByWarehouse; + +``` + +Given the `equals()` implementation of the Warehouse class, this map +can now be given any instance of Warehouse as a key, and we need not +ensure that we have one canonical instance used as the key. + +There are of course many alternatives to this. One may not want this +map to keep Warehouse instances alive, in which case it would be more +sensible to use the codes (Strings) for the keys. One could also add +a field to Warehouse such as `List<Product> products;`. However, in +some circumstances, a map such as the above may be most natural. + +Other data structures and operations depend on equality checks as +well. For example: finding duplicates in a collection, checking +whether a collection already contains a given item, managing a set +data type, and so on. The way we implement equality is also useful +for such purposes. + +### Elegance and encapsulation + +Secondly, we may have subsystems in the application that communicate +by passing Product or Warehouse instances to each other. For example, +the user may be looking at a user interface displaying the products in +a warehouse with their current quantities. The user may then click on +an "update quantities" button which opens a sub-interface where it's +possible to add an arbitrary number of entries (products) with the +desired quantity change for each. This sub-interface may then return +a list of Product instances with the associated quantity change. The +code of the warehouse overview interface can now use `equals()` to +determine which instance of Product it received corresponds to which +Product instance it's already holding on to. + +Again, there are of course many alternative strategies which don't +require such a use of `equals()`. One may explicitly compare the id +fields, or the sub-interface may return ids associated with quantity +changes. However, the `equals()` based implementation may offer the +cleanest possible code: + +```java + +public void receiveQuantityChanges(Map<Product, Integer> changes) { + for (Row<Product> row : this.productRows) { + Integer change = changes.get(row.product); + row.quantity += change; + } +} + +``` + +As far as readability and elegance is concerned, this can hardly be +improved on, as far as one is familiar with Java. Although using a +`Map<Integer, Integer>` would hardly make the code any more verbose, +it immediately causes it to lose out on self-documentation: + +```java + +// Quantity changes of what? Should we rename this to the awfully +// long name receiveProductQuantityChanges to make it clear? +public void receiveQuantityChanges(Map<Integer, Integer> changes) { + for (Row<Product> row : this.productRows) { + Integer change = changes.get(row.product.id); + row.quantity += change; + } +} + +``` + +This is a minor change as far as readability is concerned, but it also +decreases encapsulation. The code needs to be aware that products are +represented uniquely by an id that is an integer, and changing this +implementation detail of the class may have far-reaching consequences +throughout the code base. + +The self-documenting property may not apply to a Scheme-based language +without static types, but it makes the encapsulation aspect all the +more significant, since we won't have a compiler or static analyzer +which immediately tells us about a change in the type of `id`. We +must hope that the field `id` has been removed and replaced with a +different one, not reusing the `id` identifier. (For example, this +would mean a procedure like `product-id` stops existing, so a static +analyzer or compiler can immediately warn us about its absence. Had +we reused the field but changed its type, we would have a very hard +time finding all the places in the code we now need to fix.) + +## Get to the point! + +I've explained all this to arrive at one simple conclusion: + +Perhaps it would be a good idea for a language to have a mechanism in +which compound types like records or classes can simply declare that +one of the defined fields is the "unique identifier" of the type. + +Imitating SRFI 9: + +```scheme + +(define-record-type <product> + (make-product id name description ...) + product? + (identity: id) ; <- see here + (id product-id) + (name product-name) + (description product-description) + ...) + +(define-record-type <warehouse> + (make-warehouse code name ...) + warehouse? + (identity: code) ; <- see here + (code warehouse-code) + (name warehouse-name) + ...) + +``` + +Now the million dollar question is whether it should be `equal?` that +makes use of this information, or `eqv?`, or both. + +Although `eqv?` is intended to approximate operational equivalence, +and should therefore not consider separate mutable objects to be the +same, it's not clear to me how useful this is on user-defined types. + +The rationale for defining `eqv?` in terms of operational equivalence +is that this allows implementing memoization. If a memoized function +depends on one of the mutable fields of a record, yet `eqv?` returns +true on separate instances with different state, then the function +will return an incorrect result. But I'm skeptical as to whether a +function like that would see much practical use. It seems to me like +a memoized function that is used to compute some property of an entity +in our application domain should probably depend only on immutable +properties of that object. + +Another way to look at `eqv?` is that it implements "mostly constant +time" equality. (Strictly speaking, it's a linear time operation on +numbers, assuming the implementation offers arbitrary-size numbers. +This rarely matters, as few programs make use of numbers larger than +what fits in a few dozen bytes.) Conversely, `equal?` is responsible +for deep equality testing, which is inherently linear time, though +record types were overlooked in its definition in R7RS-small. + +## "Why not neither?" + +(I'm coining this as a new meme, in contrast to "why not both?") + +I think I want Zisp to support the following: + +* `eq?`: As in Scheme, but maybe allowed to fail on procedures. +* `eqv?`: As in Scheme. +* `equiv?`: An even closer approximation of operational equivalence, + by diving into immutable compound data types that `eqv?` doesn't + handle due to trying to be constant-time. I believe this is the + same as `equal-always?` in Racket. +* `equal?`: As in Scheme, but also dives into records because why not? + +The reason `eq?` is allowed to fail on procedures is an optimization +strategy that duplicates/inlines procedures. (citation needed) + +And which one should make use of the `identity` field of user types? +Only `equiv?`, because the identity object may be a string or other +immutable but compound object. (Thinking back to my Java programs, +I've had an extremely common type that was identified by a pair of +integers, for example: document and line number. One can also +conceive of N-dimensional coordinates and other such examples.) + +We want `eqv?` to remain "pretty much constant time" with the only +exception being numbers, which seem highly unlikely to reach such a +large size that it would matter. Recursive data structures, and +uninterned strings, on the other hand, could become quite large and +costly to compare. diff --git a/html/notes/immutable.md b/html/notes/immutable.md new file mode 100644 index 0000000..78652e9 --- /dev/null +++ b/html/notes/immutable.md @@ -0,0 +1,56 @@ +# More immutability + +I see no reason to have mutable variables in the language. + +Usually, code is analyzed to distinguish between mutable and immutable +variables, because this aids in optimization. This means you end up +with two types of variables, and whether a variable is of one or the +other type is determined solely from how it's used. Ugly! + +An explicit box data structure can trivially replicate the features of +a mutable variable, so let's just use that instead. + +Our `set!` can assume a box when a plain identifier is used. But to +get the value, we call `get`. + +```scheme + +(let ((x (box 0))) + (while foo + (set! x (+ x 1))) + (get x)) + +``` + +I've not yet made up my mind on whether pairs should be immutable by +default, but they probably should. Strings, as also mentioned in +[symbols](symbols.html), will be immutable, since string constants +will be the same thing as symbols. + +## Late additions + +It now occurs to me that, if you want your explicitly boxed value to +not be heap-allocated, your compiler will need to analyze its use and +potentially unbox it. + +So, in terms of code analysis complexity, it may not actually make a +difference, but I still like the more explicit demarcation of mutable +variables. Perhaps the syntax and semantics could be changed to: + +```scheme + +(let ((x (mutable 0))) + (while foo + (set! x (+ x 1))) + x) + +``` + +This is different in that passing around `x` will not actually pass +around a box whose contents can be mutated; rather, it's a regular +variable like in Scheme, but mutable unlike normal Zisp variables. +The `mutable` identifier would be part of the `let` syntax and not +possible to use anywhere else. (Probably not even with `define`.) + +It's really just to make code more explicit and easier to grasp, +without any effects on compiler complexity, probably. diff --git a/html/notes/let.md b/html/notes/let.md new file mode 100644 index 0000000..4af41bd --- /dev/null +++ b/html/notes/let.md @@ -0,0 +1,41 @@ +# No shadowing (shock!) and a reduced set of `let` forms + +This may be shocking for Schemers but I believe shadowing variables is +evil. I've already written a bug once that would have been prevented +had it not been possible to shadow variables, and I don't even write +that much Scheme code. + +And let's face it: The presence of four different forms by the name of +`let`, `let*`, `letrec`, and `letrec*` is daunting when you're coming +from another language. Lack of shadowing allows us to reduce this +without losing out much functionality. + +In the absence of shadowing, `let` becomes nearly useless because you +can always use `letrec` to fulfill the same need; it's strictly more +powerful. (The only thing `let` can do that `letrec` can't, is to +refer to the previous binding of a variable before shadowing it.) + +Further, the value of vanilla `letrec` is dubious when `letrec*` is +strictly more powerful. So, in Zisp, `let` is the ultimate binding +form that does what `letrec*` does in Scheme. + +Except it does more! We haven't looked at the whole `let-values` +family yet. In Zisp, these are also merged into the ultimate `let`, +using the SRFI 71 syntax: + +```scheme + +(let ((a (one-value)) + (b c (two-values)) + ((values d e . rest) (arbitrary-values))) + (do-things)) + +``` + +You may be wondering whether it also supports the "let loop" syntax of +vanilla Scheme `let` and the answer is no, because I hate that syntax. +It has too high a risk of leading to absolute spaghetti code with no +clear indication as to how the loop variables are being updated. + +If you want to loop, use a dedicated looping syntax! Even `do` is +better than "let loop" shenanigans if you ask me. diff --git a/html/notes/nan.md b/html/notes/nan.md new file mode 100644 index 0000000..c56609c --- /dev/null +++ b/html/notes/nan.md @@ -0,0 +1,384 @@ +# NaN-Packing + +NaN-packing (also called NaN-boxing) is a strategy involving the use of NaN +bit patterns, that are otherwise unused, to store various values in them. + +In the implementation of a dynamically typed language, this can be used to +ensure that all types in the language can be represented by a single 64-bit +value, which is either a valid double, an actual NaN value, or one of the +other NaN bit patterns that represent some other type, potentially in the +form of a pointer to a heap object containing further data. + +This works because pointers only need 48 bits in practice, and the range of +unused NaN bit patterns contains an astounding `2^53 - 4` different values. + +IMPORTANT NOTE: All illustrations of data structures and bit patterns use +big-endian. When implementing the strategies described herein, it may be +necessary to reorder the elements. For example, the elements of packed +structs in Zig are ordered least to most significant. + + +## The double format + +The IEEE 754 double-precision binary floating-point aka binary64 format is: + + { sign: u1, exponent: u11, fraction: u52 } + +Possible types of values a double can encode include: + + { sign == any, exponent != 0x7ff, fraction == any } :: Real (finite) + { sign == any, exponent == 0x7ff, fraction == 0x0 } :: Infinity + { sign == any, exponent == 0x7ff, fraction != 0x0 } :: NaN + +Note: + + 0x7ff = u11 with all bits set (0b11111111111) + +In other words: + + all exponent bits set, fraction bits all zero :: Infinity + all exponent bits set, fraction part non-zero :: NaN + + +## Details of NaN values + +There are two different NaN types: signaling and quiet. Quiet NaN may be +returned by FP operations to denote invalid results, whereas signaling NaN +are never returned by FP operations and serve other purposes. + +Modern hardware sets the MSB of the fraction to indicate that the NaN is a +quiet one, so let's refine our definition for denoting NaN values: + + { sign: u1, exp: u11, quiet: u1, rest: u51 } + +Variants of NaN: + + { sign == any, exp == 0x7ff, quiet == 0, rest >= 0x1 } :: sNaN + { sign == any, exp == 0x7ff, quiet == 1, rest == any } :: qNaN + +Note that in case of the signaling NaN, the rest of the fraction must be +non-zero, since otherwise the entire fraction part would be zero and thus +denote an infinity rather than a NaN. + +Most systems have a "canonical" quiet NaN that they use: + + { sign == any, exp == 0x7ff, quiet == 1, rest == 0x0 } :: cqNaN + +The sign bit of the canonical quiet NaN is undefined and may differ from +operation to operation or depending on the platform. + +It's useful to see a few common examples expressed in hex: + + 0x7ff8000000000000 :: cqNaN, sign bit nil + 0xfff8000000000000 :: cqNaN, sign bit set + + 0x7ff8000000000001 :: smallest non-canon qNaN, sign bit nil + 0xfff8000000000001 :: smallest non-canon qNaN, sign bit set + + 0x7fffffffffffffff :: largest non-canon qNaN, sign bit nil + 0xffffffffffffffff :: largest non-canon qNaN, sign bit set + + 0x7ff0000000000001 :: smallest sNaN, sign bit nil + 0xfff0000000000001 :: smallest sNaN, sign bit set + + 0x7ff7ffffffffffff :: largest sNaN, sign bit nil + 0xfff7ffffffffffff :: largest sNaN, sign bit set + + +## Unused NaN bit patterns + +Let's start with the quiet NaN values. + +Theoretically, there only needs to be one canonical quiet NaN, so we would +have `2^52 - 1` unused bit patterns in the quiet NaN range. In practice, +however, the sign bit may differ from one operation to the next. + +For example, the fabs function may simply clear the sign of the argument, +without minding it being a NaN. In that case, if the platform's regular +canonical NaN is the one with the sign bit set, we would end up getting +another, "semi-canonical" quiet NaN bit pattern, with the sign bit nil. + +So, both variants of the canonical quiet NaN are in use. + +This leaves `2^52 - 2` definitely-unused quiet NaN bit patterns: + + { sign == any, exp == 0x7ff, quiet == 1, rest >= 0x1 } :: Unused qNaN + +Remember that signaling NaN are defined in a very similar way: + + { sign == any, exp == 0x7ff, quiet == 0, rest >= 0x1 } :: sNaN + +Since none of those can be returned by FP operations, they could all be seen +as unused, giving us another `2^52 - 2` bit patterns. + +In total, this gives us `2^53 - 4` definitely-unused NaN bit patterns. + + +## Representing Zisp values and pointers as unused NaN bit patterns + +Zisp wants to store two different things in unused NaN patterns: + +1. Pointers (to anything in principle) + +2. Non-double primitive aka "immediate" values + +It may seem intuitive to use signaling NaN for one, and quiet NaN for the +other. However, this would fragment our "payload" bits, since we would be +using the sign bit as its MSB and the remaining 51 bits of the fraction as +the rest of the payload. + +Further, we want to use as many bit patterns as possible for fixnums, so we +can have a nice large fixnum range. To this end, it would be nice if we +could, for example, use all bit patterns where the sign bit is set for our +representation of fixnums, and then the range of bit patterns with the sign +bit unset can be shared among the remaining values, and pointers. + +Then let's do exactly that, and use the sign as the first major distinction +between fixnums and other values, using it as a sort of `is_int` flag: + + { sign == 0x0, exp == 0x7ff, payload == ??? } :: Non-Fixnum + { sign == 0x1, exp == 0x7ff, payload == ??? } :: Fixnum + +It will become apparent in a moment why we haven't defined the payload yet. + +Given that our payload is the entire fraction part of the IEEE 754 double +format, we must be careful not to use the following two payload values +regardless of the sign bit: + +1. Zero: This would make the bit pattern represent an infinity, since the +payload is the entire fraction and a zero fraction indicates infinity. + +2. `0x8000000000000` (aka only the MSB is set): This would make the bit +pattern a canonical quiet NaN, since the payload MSB is the quiet bit. + +This means that in each category (sign bit set, or nil) we have `2^52 - 2` +possible bit patterns, and the payload has a rather strange definition: + + 0x0 < payload < 0x8000000000000 < payload < 0xfffffffffffff + +Can we really fit a continuous range of fixnum values into that payload +without significantly complicating things? Yes, we can! Observe. + + +## Fixnum representation + +We will store positive and negative fixnums as separate value ranges, using +the quiet bit to differentiate between them. + +Let's go back to considering the quiet bit a separate field: + + { sign == 0x1, exp == 0x7ff, quiet == 0x0, rest >= 0x1 } :: Positive + { sign == 0x1, exp == 0x7ff, quiet == 0x1, rest >= 0x1 } :: Negative + +But, I hear you say, the positive range is missing zero! Worry not, for +maths is wizardry. We will actually store positive values as their unary +complement (bitwise NOT) meaning that all bits being set is our zero, and +only the LSB being set is the highest possible value. + +This must be combined with a bitwise OR mask, to ensure that the 13 highest +of the 64 bits turn into the correct starting bit pattern for a signed NaN. +Unpacking it is just as simple: Take the unary complement (bitwise NOT) and +then use an AND mask to unset the 13 highest: + + POS_INT_PACK(x) = ~x | 0xfff8000000000000 + + POS_INT_UNPACK(x) = ~x & 0x0007ffffffffffff + +If you've been paying very close attention, you may notice something: Given +that we know the 13 highest bits must always have a certain respective value +in the packed and unpacked representation (12 highest set when packed, none +set when unpacked), we can use an XOR to flip between the two, and the same +XOR can take care of flipping the remaining 51 bits at the same time! + +This also means packing and unpacking is the same operation: + + POS_INT_PACK(x) = x ^ 0xfff7ffffffffffff + + POS_INT_UNPACK(x) = x ^ 0xfff7ffffffffffff + +There we go; packing and unpacking 51-bit positive fixnums with one XOR! +Amazing, isn't it? + +As for the negative values, it's even simpler. This time, the correct NaN +starting pattern has all 13 bits set, since the quiet bit being set is what +we use to determine the number being negative. And would you believe it; +this means the packed negative fixnum already represents itself! + + NEG_INT_PACK(x) = x + + NEG_INT_UNPACK(x) = x + +Isn't that unbelievable? I need to verify this strategy further, but based +on all information I can find about NaN values, it should work just fine. + +The only disappointing thing is that it's positive integers that need an XOR +to pack and unpack, rather than negative ones. One would expect positive +values to occur much more frequently in typical code. But I think we can +live with a single XOR instruction! + + +## Pointers & Others + +We still want to represent the following, which must share space within the +`2^52 - 2` bit patterns that can be packed into an unsigned NaN: + +- Pointers of various kinds +- Unicode code points (21-bit values) +- False, true, null, end-of-file, and maybe a few more singletons + +It seems sensible to split this into two main categories: pointers and other +values. Let's use the quiet bit as a `pointer` flag: + + { sign == 0x0, exp == 0x7ff, quiet == 0x0, rest >= 0x1 } :: Other + { sign == 0x0, exp == 0x7ff, quiet == 0x1, rest >= 0x1 } :: Pointer + +Note how neither type is allowed to have a zero payload, since in case of an +unset quiet bit, this would make our value an infinity, and in case of a set +quiet bit it would give us a canonical quiet NaN. Each of them is allowed +any other payload than zero. + +### Pointers + +It would seem that we have 51 bits left to represent a pointer (though we +need to avoid the value zero). But we only need 48 bits... or even less! +Since allocations happen at 8-byte boundaries on 64-bit machines, we only +really need 45 of the 48 bits, given the least significant 3 will never be +set. This gives us a whole 6 free bits to tag pointers with! If we have +that much play room, we can do some crazy things. + +#### Foreign pointers + +Firstly, let's introduce the concept of a "foreign" pointer. This means the +pointer doesn't necessarily point to a Zisp object, and may not be 8-byte +aligned. As it may point to anything, there's no point in defining further +bits as tagging additional information, so we have all 50 bits available. + +Let's cut out the 12 high bits of our double since we already know what they +must contain, and look at the definition of our 52-bit payload. + +I will also mix up the notation a bit, to indicate that some fields are only +defined if a previous field has a given value. + + { pointer == 0x1, foreign: u1, rest: u50 } + +(The `pointer` field is the `quiet` bit i.e. MSB of the 52-bit fraction.) + +If the foreign bit is set, then the entire `rest` field shall be seen as +opaque and may contain any value. Another way to look at this is that we +essentially defined another fixnum range of 50 bits. This can include the +value zero, since the foreign bit being set ensures we don't step on the +forbidden all-zero payload value. + +#### Zisp pointers + +Now let's look at what we can do with "native" Zisp pointers. + +Wouldn't it be nice if our language had an explicit "pointer" data type and +it didn't require any additional heap allocation? So let's decide that one +bit is dedicated to distinguishing between an explicit pointer object, and +regular pointers that stand in for the object being pointed to. + +Perhaps it would be good to show some Zisp pseudo-code to explain what that +means, since it may be a strange concept: + + ;; In memory, vec is represented as a regular/direct vector pointer. + (define vec (vector 1 2 3)) + + ;; We can of course use this variable as a vector. + (vector? vec) ;=> #t + (vector-ref vec 0) ;=> 1 + + ;; Now we create an explicit pointer object pointing to that vector. + ;; Distinguished by a special bit in the in-memory value of vec-ptr. + (define vec-ptr (pointer vec)) + + ;; This variable is *not* a vector; it's a vector-pointer. + (vector? vec-ptr) ;=> #f + (vector-ref vec-ptr 0) ;ERROR + (pointer? vec-ptr) ;=> #t + (pointer-ref vec-ptr) ;=> #(1 2 3) + +This is *not* the same thing as a box, because it can *only* refer to heap +allocated objects, not immediates, whereas a box would be able to hold an +immediate value like an integer or double as well. + + (pointer 42) ;ERROR + (box 42) ;=> #<box:42> + +A box would necessarily need heap allocation, whereas a pointer doesn't. + +It's *also not* the same thing as a foreign pointer, because those can be +anything, whereas these pointer objects definitely point to Zisp objects. + +Pointers may or may not be mutable; I've not made up my mind yet. It may +seem like a pointless data type, but it adds a little bit of expressive +strength to our language. For example, when working with an FFI. And +there's really not much else we can do with all our bits. + +Let's use the term "indirect" for this tag, since "pointer" is already used: + + { pointer == 0x1, foreign == 0x0, indirect: u1, rest: u49 } + +Direct or indirect makes no difference to the fact that the pointer value +will be 8-byte aligned, so we still have 4 bits for more information about +what's being pointed to. Also, since the actual pointer value can never be +zero (all non-foreign pointers must point to a valid Zisp object), we avoid +the forbidden zero pattern. Thus, we can indicate 16 different values with +our 4 remaining bits. + +It would have been nice to avoid fragmentation of these remaining tag bits. +However, we want to avoid shifting, so let's go with this definition for the +remaining 49 bits: + + { tag_high: u1, pointer_value: u45, tag_low: u3 } + +The pointer value is extracted by masking the entire bit sequence, so it +actually becomes a 48-bit value without further shifting. + +The tag can be used to tell us what we're pointing to, so that type checks +often don't require following the pointer. The memory location that's being +pointed to may duplicate this information, since we may want to ensure that +any Zisp object on the heap carries its type information within itself, but +I'm not yet decided on that. + +In any case, let's list some common heap data types that our 4-bit tag can +represent, making sure to have an "other" wildcard for future extensions. + +The right side shows the value of the type tag when it's acquired by masking +the 49-bit Zisp pointer payload. + + 0. String (Symbol) .... 0x0000000000000 + 1. Pair (List) 0x0000000000001 + 2. Vector ............. 0x0000000000002 + 3. Map (Hash-table) 0x0000000000003 + 4. Box ................ 0x0000000000004 + 5. Record 0x0000000000005 + 6. Class .............. 0x0000000000006 + 7. Instance 0x0000000000007 + 8. Text ............... 0x1000000000000 + 9. Byte-vector 0x1000000000001 + 10. Procedure ......... 0x1000000000002 + 11. Continuation 0x1000000000003 + 12. Port .............. 0x1000000000004 + 13. Error 0x1000000000005 + 14. Enum .............. 0x1000000000006 + 15. Other 0x1000000000007 + +This list is likely to change; for example: errors should probably be class +instances, continuations could be merged with procedures, and so on. But +this gives us a rough picture and demonstrates that 16 distinct values is +quite sufficient for avoiding a pointer de-reference in type checking. + +(Why is it so important to avoid following a pointer when checking a type? +Who knows? Did I say it was important? Why look at me like that??) + +### Other values, including Unicode + +WIP + + +<!-- +;; Local Variables: +;; fill-column: 77 +;; End: +--> diff --git a/html/notes/oop.md b/html/notes/oop.md new file mode 100644 index 0000000..0821b42 --- /dev/null +++ b/html/notes/oop.md @@ -0,0 +1,3 @@ +# Object-oriented programming isn't that bad + +WIP diff --git a/html/notes/records.md b/html/notes/records.md new file mode 100644 index 0000000..b93e0c3 --- /dev/null +++ b/html/notes/records.md @@ -0,0 +1,54 @@ +# Subtyping of record types + +It's a serious limitation of SRFI 9 that it doesn't allow creating +subtypes with additional fields. This is an invaluable strategy for +representing a hierarchy of types, which are ubiquitious in real life +and thus in programming. + +Sadly, this brings with it some significant complications if records +are to be initialized with default values to ensure invariants. The +R6RS solves this with an incredibly sophisticated system, which we +might need to adopt. (Search for "protocol" and "record-constructor +descriptor" in the R6RS.) + +However, we may be able to get away with a simpler approach... + +UNDER CONSTRUCTION + +Are constructor protocols really that important? Consider that all we +can do is add additional fields in the subtype. What if we separated +allocation from initialization: + +```scheme +(define-record r1 + (parent #f) + (fields a b)) + +(define (init-r1! r a b) + (set-r1-a! a) + (set-r1-b! b)) + + +(define-record r2 + (parent r1) + (fields c d)) + +(define (init-r2! r a b c d) + (init-r1! r a b) + (set-r2-c! c) + (set-r2-d! d)) + + +(define-record r3 + (parent r2) + (fields e f)) + +(define (init-r3! r a b c d e f) + (init-r2! r a b c d) + (set-r3-e! e) + (set-r3-f! f)) + + +(define r (make-r3)) +(init-r3! r 1 2 3 4 5 6) +``` diff --git a/html/notes/serialize.md b/html/notes/serialize.md new file mode 100644 index 0000000..e35177e --- /dev/null +++ b/html/notes/serialize.md @@ -0,0 +1,67 @@ +# Everything can be serialized + +Let's look at the code mentioned in [compilation](compilation.html) +again: + +```scheme + +;; The fictional file `lookup-table.dat` would be in the source code +;; repository, and the fictional procedure `process-data` would read +;; it and return a data structure. +(define lookup-table (process-data "lookup-table.dat")) + +``` + +(In Zisp, this would be executed at compile-time, so the lookup table +becomes part of the compiled program.) + +If you're familiar with Guile --and I suspect most implementations of +Scheme have a similar limitation-- then you may have noticed an issue. + +Not all Scheme objects can be serialized. This not only applies to +the `write` procedure, but also the compiler's ability to put objects +into the binary representation of a compiled program. (For example, +the .data section of an ELF file in case of Guile.) + +This can be demonstrated as follows: + +```scheme + +(define-syntax process-data + (lambda (stx) + (syntax-case stx () + ((_ file) + ;; Ignore `file`; this is just an example! + (let ((ht (make-eqv-hashtable))) + (hashtable-set! ht 1 2) + ht))))) + +(define lookup-table (process-data "lookup-table.dat")) + +``` + +Compiling this with `guild` will yield an error, complaining about an +"unhandled constant" represented as #<r6rs:hashtable ...> in the error +message. What it's actually trying to say is that hash tables aren't +constants, and the compiler doesn't know how to put them into the ELF +file it's writing. + +(At least, this is the case as of February 2025, using Guile 3.0.10; +who knows what the future will provide!) + +In Zisp, I want absolutely everything to be possible to serialize, and +the compiler should simply be using this capability of the language to +write out compiled binaries. + +For example, given that any Zisp program has to declare a `main` entry +point function, all the compiler would do is execute the file and then +call `(write main)`. How elegant! + +This serialization of a function would, of course, involve traversing +all references therein, and including them in the output somehow. The +same will apply to writing out any data structure. This means that +serializing a module is *not* a matter of invoking `write` on each of +its exported definitions. This would lead to lots of duplicate data +between the outputs, and `eq?` relations would be lost after reading +them back in. We probably want a first-class `module` type that can +be serialized as one object. diff --git a/html/notes/strict-mode.md b/html/notes/strict-mode.md new file mode 100644 index 0000000..5b99386 --- /dev/null +++ b/html/notes/strict-mode.md @@ -0,0 +1,16 @@ +# Strict mode to disallow ignoring returned values + +This ties in to the last point. In Scheme, a non-tail expression in a +body can return an arbitrary number of values, which will be silently +ignored. + +This can lead to bugs, where a procedure actually returns some kind of +success or failure indicator (instead of raising an error) and the +programmer forgets to handle it. + +Though it may be too inefficient to enable globally, there should at +least be a mode of compilation that emits code which checks at every +single function return whether there are any values that are being +ignored, and raises an error if so. + +There would of course be a form to explicitly ignore values. diff --git a/html/notes/sugar.md b/html/notes/sugar.md new file mode 100644 index 0000000..217b0d3 --- /dev/null +++ b/html/notes/sugar.md @@ -0,0 +1,85 @@ +# A little bit of syntax sugar never hurt anyone + +## Lambda shorthands + +We could benefit from a more minimal syntax to express lambda, as an +alternative to the written-out form: + +```scheme + +;; {x ... expr} = (lambda (x ...) expr) + +(map {x y (+ x y)} xs ys) + +(for-each {x (display x)} objects) + +``` + +## Vector and map references + +Furthermore, `foo[bar]` could be sugar for `(vector-ref foo bar)` and +`foo{bar}` could be sugar for `(map-ref foo bar)` or the like. (Not +sure yet whether to use the word "map" for a data type, due to the +overlap with the `map` function. Perhaps `mapping`. In any case +there must be a map kind of data type in the standard library.) + +## Records + +Why not make `foo.bar` reader syntax for `(record-ref foo bar)` where +`record-ref` is a macro that interprets `bar` as a constant? + +Admittedly, this loses us some elegance compared to the widely used +SRFI 9 (standardized in R7RS-small) where field getters only exist as +procedures, so making a field "private" is simply a matter of leaving +the getter out of your module exports. + +A general `record-ref` that accepts field names would need to be aware +of the concept of private fields and block access to them except if +invoked in a context where the private fields are accessible. + +I think the nice syntax is worth the added complexity. It doesn't +seem terribly difficult to expand the concept of "lexical scope" to +include field accessibility information. + +Alternatively, we could just not have private fields, like in Python. +Fields that are meant to be private could be named in a special way by +convention, such as `%foo` or whatever. I have to check whether real +encapsulation would provide us with any substantial benefits, such as +in optimization. + +Oh, speaking of performance, of course `(record-ref x y)` has a big +problem: It would require dynamic dispatch since the type of x is not +statically known. (And no we don't want to write `person.person-age`, +we want `person.age` where `age` is not treated as an identifier but +merely as a symbol for lookup.) + +It may be that we want to add static typing to Zisp! + +We may also add OOP-style objects and only use the dot notation for +their methods, but not fields. The reader syntax for `foo.bar` may +then expand to `(method-dispatch foo bar)`. It would also work for +fields if we really did add static typing, of course. The reason it +would only work on methods is that those need dynamic dispatch anyway. + +THIS POINT NEEDS A LOT MORE CONSIDERATION! + +## Built-in SRFI 17 + +The functionality of SRFI 17 should be a core aspect of the language, +so the following all work: + +```scheme + +;; Vector +(define vec (vector 1 2 3)) +(set! vec[n] value) + +;; Some kind of mapping +(define table (make-table)) +(set! table{key} value) + +;; Record type +(define rec (make-foo)) +(set! rec.field value) + +``` diff --git a/html/notes/symbols.md b/html/notes/symbols.md new file mode 100644 index 0000000..8aa666f --- /dev/null +++ b/html/notes/symbols.md @@ -0,0 +1,19 @@ +# Symbols are strings are symbols + +In Scheme, symbols are literally just interned and immutable strings. +They can contain any character a string can, constructed either via +`string->symbol` or the modern `|foo bar baz|' syntax for quoted +symbols. Why not just embrace the fact that they are strings? + +Scheme strings are mutable, but they are a terrible choice for text +manipulation, because they are constant-length. They are literally +just vectors of characters. If you wanted a vector of characters, +well, use a vector of characters! + +Zisp won't differentiate between symbols and strings. All strings +will be immutable, string constants will be automatically interned, +and bare symbols will just be reader syntax for a string constant. + +Instead of `string->symbol` we will have `string-intern` which +basically does the same thing. Dynamically generated strings that +aren't passed to this function will be uninterned. diff --git a/html/notes/zero-values.md b/html/notes/zero-values.md new file mode 100644 index 0000000..3a1eecb --- /dev/null +++ b/html/notes/zero-values.md @@ -0,0 +1,11 @@ +# Return zero values when there's nothing to return + +This is only a minor point: + +It's a long-running pet peeve of mine that R5RS Scheme specifies "an +unspecified value" to be returned when there's nothing meaningful to +return. It's a remnant from before we had the ability to return +multiple values, and should be eliminated. + +Any operation that has nothing meaningful to return, will return zero +values in Zisp, and no more. diff --git a/html/prelude.html b/html/prelude.html new file mode 100644 index 0000000..14a155b --- /dev/null +++ b/html/prelude.html @@ -0,0 +1,13 @@ +<!doctype html> +<head> + <meta charset="utf-8"/> + <title>__TITLE__</title> + <meta name="viewport" content="width=device-width, initial-scale=1"/> + <link rel="stylesheet" type="text/css" href="/zisp/style.css"/> + <link rel="stylesheet" type="text/css" href="/zisp/highlightjs/styles/default.min.css"/> + <script src="/zisp/highlightjs/highlight.min.js"></script> + <script> + hljs.configure({languages:[]}); + hljs.initHighlightingOnLoad(); + </script> +</head> diff --git a/html/style.css b/html/style.css new file mode 100644 index 0000000..f1b474b --- /dev/null +++ b/html/style.css @@ -0,0 +1,23 @@ +body { + margin: 20px auto; + padding: 0 20px; + max-width: 80ch; + + background: #eee; + color: #333; + + font-family: sans; +} + +h1, h2, h3 { + line-height: 1.2; +} + +p { + line-height: 1.6; + text-align: justify; +} + +code { + font-size: 1.2em; +} diff --git a/src/main.zig b/src/main.zig new file mode 100644 index 0000000..d40f09a --- /dev/null +++ b/src/main.zig @@ -0,0 +1,73 @@ +//! By convention, main.zig is where your main function lives in the case that +//! you are building an executable. If you are making a library, the convention +//! is to delete this file and start with root.zig instead. + +const std = @import("std"); + +/// This imports the separate module containing `root.zig`. Take a look in `build.zig` for details. +const zisp = @import("zisp_lib"); + +pub fn main() !void { + const T = extern union { + bits: u64, + double: f64, + }; + var f1: *volatile f64 = undefined; + var f2: *volatile f64 = undefined; + var x: *volatile T = undefined; + + var _gpa: std.heap.GeneralPurposeAllocator(.{}) = .init; + const gpa = _gpa.allocator(); + + f1 = try gpa.create(f64); + f2 = try gpa.create(f64); + x = try gpa.create(T); + + f1.* = 0.0; + f2.* = 0.0; + x.*.double = f1.* / f2.*; + std.debug.print(" 0/0: {x}\n", .{x.*.bits}); + + f1.* = -0.0; + f2.* = 0.0; + x.*.double = f1.* / f2.*; + std.debug.print("-0/0: {x}\n", .{x.*.bits}); + + // // Prints to stderr (it's a shortcut based on `std.io.getStdErr()`) + // std.debug.print("All your {s} are belong to us.\n", .{"codebase"}); + + // // stdout is for the actual output of your application, for example if you + // // are implementing gzip, then only the compressed bytes should be sent to + // // stdout, not any debugging messages. + // const stdout_file = std.io.getStdOut().writer(); + // var bw = std.io.bufferedWriter(stdout_file); + // const stdout = bw.writer(); + + // try stdout.print("Run `zig build test` to run the tests.\n", .{}); + + // try bw.flush(); // Don't forget to flush! +} + +test "simple test" { + var list = std.ArrayList(i32).init(std.testing.allocator); + defer list.deinit(); // Try commenting this out and see if zig detects the memory leak! + try list.append(42); + try std.testing.expectEqual(@as(i32, 42), list.pop()); +} + +test "use other module" { + //try std.testing.expectEqual(@as(i32, 150), lib.add(100, 50)); +} + +test "fuzz example" { + const Context = struct { + fn testOne(context: @This(), input: []const u8) anyerror!void { + _ = context; + // Try passing `--fuzz` to `zig build test` and see if it manages to fail this test case! + try std.testing.expect(!std.mem.eql(u8, "canyoufindme", input)); + } + }; + try std.testing.fuzz(Context{}, Context.testOne, .{}); +} + +test "nan" {} diff --git a/src/root.zig b/src/root.zig new file mode 100644 index 0000000..05ad381 --- /dev/null +++ b/src/root.zig @@ -0,0 +1,191 @@ +//! By convention, root.zig is the root source file when making a library. If +//! you are making an executable, the convention is to delete this file and +//! start with main.zig instead. +const std = @import("std"); +const builtin = @import("builtin"); +const testing = std.testing; + +// Read the following article to understand the NaN-packing strategy: +// +// https://tkammer.de/zisp/notes/nan.html +// +// Note: Packed structs are least-to-most significant, so the order of fields +// must be reversed relative to a typical big-endian illustration of the bit +// patterns of IEEE 754 double-precision floating point numbers. + +const Value = packed union { + double: f64, + nan: packed struct { + rest: u51, + quiet: u1, + exp: u11, + sign: u1, + }, + int: packed struct { + code: u51, + neg: bool, + exp: u11, + is_int: bool, + }, + pointer: packed struct { + value: u48, + type: u3, + _zo: u1, + _qnan: u12, + }, +}; + +// Helpers + +inline fn zisp_dump(v: Value) void { + std.debug.dumpHex(std.mem.asBytes(&v)); +} + +///! Checks for any IEEE 754 NaN. +inline fn zisp_is_nan(v: Value) bool { + return v.nan.exp == std.math.maxInt(u11); +} + +///! Checks for a Zisp value packed into a NaN. +inline fn zisp_is_packed(v: Value) bool { + return zisp_is_nan(v) and v.nan.rest != 0; +} + +///! Checks for a regular double including infinity or canonical NaN +inline fn zisp_is_double(v: Value) bool { + return !zisp_is_packed(v); +} + +inline fn zisp_assert_double(v: Value) void { + if (!zisp_is_double(v)) { + zisp_dump(v); + @panic("not double"); + } +} + +inline fn zisp_is_int(v: Value) bool { + return zisp_is_packed(v) and v.int.is_int; +} + +inline fn zisp_assert_int(v: Value) void { + if (!zisp_is_int(v)) { + zisp_dump(v); + @panic("not int"); + } +} + +// See detailed NaN packing docs for why the +/- 1. +const zisp_int_min = std.math.minInt(i52) + 1; +const zisp_int_max = std.math.maxInt(i52) - 1; + +inline fn zisp_assert_int_range(int: i64) void { + if (int < zisp_int_min) { + std.debug.print("int to pack is too small: {}", .{int}); + @panic("int to pack is too small"); + } + if (int > zisp_int_max) { + std.debug.print("int to pack is too large: {}", .{int}); + @panic("int to pack is too large"); + } +} + +inline fn zisp_int_pack_neg(int: i64) Value { + return @bitCast(int); +} + +inline fn zisp_int_unpack_neg(v: Value) i64 { + return @bitCast(v); +} + +const zisp_int_pos_mask: u64 = 0xfff7ffffffffffff; + +inline fn zisp_int_pack_pos(int: i64) Value { + const uint: u64 = @bitCast(int); + return @bitCast(uint ^ zisp_int_pos_mask); +} + +inline fn zisp_int_unpack_pos(v: Value) i64 { + const uint: u64 = @bitCast(v); + return @bitCast(uint ^ zisp_int_pos_mask); +} + +inline fn zisp_int_pack(int: i64) Value { + zisp_assert_int_range(int); + if (int < 0) { + return zisp_int_pack_neg(int); + } else { + return zisp_int_pack_pos(int); + } +} + +inline fn zisp_int_unpack(v: Value) i64 { + zisp_assert_int(v); + if (v.int.neg) { + return zisp_int_unpack_neg(v); + } else { + return zisp_int_unpack_pos(v); + } +} + +// Doubles + +pub fn zisp_double(d: f64) Value { + return @bitCast(d); +} + +// pub fn zisp_double_p(v: Value) Value { +// return zisp_bool(zisp_is_double(v)); +// } + +pub fn zisp_double_get(v: Value) f64 { + zisp_assert_double(v); + return v.double; +} + +pub fn zisp_double_add(v1: Value, v2: Value) Value { + const d1 = zisp_double_get(v1); + const d2 = zisp_double_get(v2); + return zisp_double(d1 + d2); +} + +// Ints + +pub fn zisp_int(int: i64) Value { + return zisp_int_pack(int); +} + +// pub fn zisp_int_p(v: Value) Value { +// return zisp_bool(zisp_is_int(v)); +// } + +pub fn zisp_int_get(v: Value) i64 { + return zisp_int_unpack(v); +} + +pub fn zisp_int_add(v1: Value, v2: Value) Value { + const int1 = zisp_int_get(v1); + const int2 = zisp_int_get(v2); + return zisp_int(int1 + int2); +} + +// Tests + +test "double add functionality" { + const d1: f64 = 0.123456789; + const d2: f64 = -0.987654321; + const v1 = zisp_double(d1); + const v2 = zisp_double(d2); + const v3 = zisp_double_add(v1, v2); + const result = zisp_double_get(v3); + try std.testing.expect(result == d1 + d2); +} + +test "int add functionality" { + const int1: i64 = 123456789; + const int2: i64 = -987654321; + const v1 = zisp_int(int1); + const v2 = zisp_int(int2); + const v3 = zisp_int_add(v1, v2); + const result = zisp_int_get(v3); + try std.testing.expect(result == int1 + int2); +} diff --git a/src/test.c b/src/test.c new file mode 100644 index 0000000..0b0917e --- /dev/null +++ b/src/test.c @@ -0,0 +1,119 @@ +#include <stdio.h> +#include <stdint.h> +#include <string.h> +#include <math.h> + +union test { + double d; + uint64_t u; +}; + +int main(int argc, char** argv) { + + volatile uint64_t mask; + volatile uint64_t min; + volatile uint64_t max; + + volatile double d1; + volatile double d2; + + volatile uint64_t pd1; + volatile uint64_t pd2; + + + // 0 .. 2^51 + + // SE__________QP__ + mask = 0b1111111111110111111111111111111111111111111111111111111111111111; + min = 0b1111111111110111111111111111111111111111111111111111111111111111; + max = 0b1111111111110000000000000000000000000000000000000000000000000001; + + memcpy(&d1, &min, 8); + memcpy(&d2, &max, 8); + + printf("%lf\n", d1); + printf("%lf\n", d2); + + memcpy(&pd1, &d1, 8); + memcpy(&pd2, &d2, 8); + pd1 ^= mask; + pd2 ^= mask; + + printf("%ld\n", pd1); + printf("%ld\n", pd2); + + printf("\n"); + + // -2^51 + 1 .. -1 + + // SE__________QP__ + min = 0b1111111111111000000000000000000000000000000000000000000000000001; + max = 0b1111111111111111111111111111111111111111111111111111111111111111; + + memcpy(&d1, &min, 8); + memcpy(&d2, &max, 8); + + printf("%lf\n", d1); + printf("%lf\n", d2); + + memcpy(&pd1, &d1, 8); + memcpy(&pd2, &d2, 8); + + printf("%ld\n", pd1); + printf("%ld\n", pd2); + + printf("\n"); + + return 0; + + // -2^50 + 1 .. -1 + + // SE__________QFP__ + mask = 0b1111111111111100000000000000000000000000000000000000000000000000; + min = 0b0000000000000000000000000000000000000000000000000000000000000001; + max = 0b0000000000000011111111111111111111111111111111111111111111111111; + + printf("%ld\n", (int64_t) (min | mask)); + printf("%ld\n", (int64_t) (max | mask)); + + + // 0 .. 2^50 + + // SE__________QFP__ + mask = 0b0000000000000011111111111111111111111111111111111111111111111111; + min = 0b0000000000000100000000000000000000000000000000000000000000000000; + max = 0b0000000000000111111111111111111111111111111111111111111111111111; + + printf("%ld\n", (int64_t) (min & mask)); + printf("%ld\n", (int64_t) (max & mask)); + + + + + /* volatile union test x; */ + /* volatile double f1; */ + /* volatile double f2; */ + + /* f1 = 0.0; */ + /* f2 = 0.0; */ + /* x.d = f1 / f2; */ + /* printf(" 0/0: %lx\n", x.u); */ + + /* f1 = -0.0; */ + /* f2 = +0.0; */ + /* x.d = f1 / f2; */ + /* printf("-0/0: %lx\n", x.u); */ + + /* x.d = sqrt(-1); */ + /* printf("sqrt(-1): %lx\n", x.u); */ + + + /* double nan_value = fabs(sqrt(-1.0)); // Standard way to generate NaN */ + + /* uint64_t nan_bits; */ + /* memcpy(&nan_bits, &nan_value, sizeof(nan_bits)); */ + + /* printf("NaN in hex: 0x%016lx\n", nan_bits); */ + + /* return 0; */ +} diff --git a/test.scm b/test.scm new file mode 100644 index 0000000..d893c9f --- /dev/null +++ b/test.scm @@ -0,0 +1,89 @@ +(import + (rnrs eval) + (rnrs hashtables)) + +(define-syntax eval-when-compile + (lambda (stx) + (syntax-case stx () + ((_ imports body ...) + (eval + (syntax->datum #'(begin body ...)) + (apply environment (syntax->datum #'imports))))))) + +(eval-when-compile + ((rnrs)) + (display "foo\n")) + +(define-syntax process-data + (lambda (stx) + (syntax-case stx () + ((_ file) + (let ((ht (make-eqv-hashtable))) + (hashtable-set! ht 1 2) + ht))))) + +(define lookup-table (process-data "lookup-table.dat")) + +(define seconds-per-day (number->string (* 24 60 60))) + +(define (foo arg1:Num arg2:Record) + (do-stuff arg2.field )) + + + +(define-class Person + (fields + name date-of-birth sex + (age person-age set-person-age!)) + (methods + ((jump height) + (if (string=? name "lebron") + (perform-jump height))))) + + + +(define-record (r1 a b)) + +(define-record (r1 a b) + (fields a b) + (set-r1-a! a) + (set-r1-b! b)) + +(define my-r1 (r1 1 2)) + + +(define-record (r2 a b c d) + (parent (r1 a b))) + +(define-record (r2 a b c d) + (parent (r1 a b)) + (fields c d) + (set-r2-c! c) + (set-r2-d! d)) + +(define-record (r2 a b c d) + (parent (r1 (* 2 c) (* 4 d))) + (fields x y z) + (set-r2-x! a) + (set-r2-y! b) + (set-r2-z! (/ a b))) + +(define my-r2 (r2 1 2 3 4)) + + +(define-record (r3 a b c d e f) + (parent r2)) + +(define-record r3 + (parent r2) + (fields e f)) + +(define (init-r3! r a b c d e f) + (init-r2! r a b c d) + (set-r3-e! e) + (set-r3-f! f)) + + +(define r (make-r3)) +(init-r3! r 1 2 3 4 5 6) + |
