From 8012e3fe177069a709f30d2ab4a18ff11025c86f Mon Sep 17 00:00:00 2001 From: Taylan Kammer Date: Thu, 8 Jan 2026 14:54:20 +0100 Subject: Draft a Manual front-page. --- docs/c1/1-parse.md | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++ docs/c1/2-decode.md | 44 ++++++++++++++++++++ docs/decoder.md | 44 -------------------- docs/index.md | 36 ++++++++++++++++ docs/parser.md | 117 ---------------------------------------------------- 5 files changed, 197 insertions(+), 161 deletions(-) create mode 100644 docs/c1/1-parse.md create mode 100644 docs/c1/2-decode.md delete mode 100644 docs/decoder.md create mode 100644 docs/index.md delete mode 100644 docs/parser.md (limited to 'docs') diff --git a/docs/c1/1-parse.md b/docs/c1/1-parse.md new file mode 100644 index 0000000..a23ebbc --- /dev/null +++ b/docs/c1/1-parse.md @@ -0,0 +1,117 @@ +# Parser for Code & Data + +Zisp s-expressions are defined in terms of an extremely minimal set of data +types; only that which is necessary to build representations of more complex +expressions and data types: + + +--------+-----------------+--------+----------+------+ + | TYPE | String | Rune | Pair | Nil | + +--------+-----------------+--------+----------+------+ + | E.G. | foo, |foo bar| | #name | (X & Y) | () | + +--------+-----------------+--------+----------+------+ + +Note that the ampersand replaces the period in pair notation. This simplifies +the grammar: periods are a regular constituent of strings, while the ampersand +cannot appear in unquoted strings. + +The parser can also output non-negative integers, but this is only used for +datum labels; number literals are handled by the *decoder*. + +The parser recognizes various "syntax sugar" and transforms it into uses of the +above data types. The most ubiquitous example is of course the list: + + (datum1 datum2 ...) -> (datum1 & (datum2 & (... & ()))) + +The following table summarizes the other supported transformations: + + "xyz" -> (#QUOTE & |xyz|) #datum -> (#HASH & datum) + + [...] -> (#SQUARE ...) #rune(...) -> (#rune ...) + + {...} -> (#BRACE ...) dat1dat2 -> (#JOIN dat1 & dat2) + + 'datum -> (#QUOTE & datum) dat1.dat2 -> (#DOT dat1 & dat2) + + `datum -> (#GRAVE & datum) dat1:dat2 -> (#COLON dat1 & dat2) + + ,datum -> (#COMMA & datum) #%hex% -> (#LABEL & hex) + + #%hex=datum -> (#LABEL hex & datum) + +A separate process called *decoding* can transform such data into more complex +types. For example, `(#HASH x y z)` could be decoded into a vector, so the +expression `#(x y z)` works just like in Scheme. + +Decoding also resolves datum labels, goes over strings to find ones that are +actually a number literal, and takes care of a number of other transformations. +This offloads complexity, allowing the parser to remain extremely simple. See +the dedicated documentation of the decoder for more. + +Further notes about the syntax sugar table and examples above: + +* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis + means zero or more data; hex is a hexadecimal number of up to 12 digits. + +* The `#datum` form only applies when the datum following the hash sign is a + list, quoted string, quote expression, another expression starting with the + hash sign, or a pipe-quoted string (see next). A bare string can follow the + hash sign by separating the two with a backslash: `#\string` + +* Strings can be quoted with pipes, like symbols in Scheme: + + |foo bar baz| + +* Though not represented in the table due to notational difficulty, the form + `#rune(...)` doesn't require a list in the second position; any datum that + works with the `#datum` syntax also works with `#rune`. + + #rune1#rune2 -> (#rune1 & #rune2) + + #rune"text" -> (#rune & "text") + + #rune\string -> (rune & string) + + #rune'string -> (#rune #QUOTE & string) + + As a counter-example, following a rune immediately with a bare string isn't + possible without the delimiting backslash, since that would be ambiguous: + + #abcdefgh ;Could be (#abcdef & gh) or (#abcde & fgh) or ... + +* Syntax sugar can combine arbitrarily; some examples follow: + + #{...} -> (#HASH #BRACE ...) + + #'foo -> (#HASH #QUOTE & foo) + + ##'[...] -> (#HASH #HASH #QUOTE #SQUARE ...) + + {x y}[i j] -> (#JOIN (#BRACE x y) #SQUARE i j) + + foo.bar.baz{x y} -> (#JOIN (#DOT (#DOT foo & bar) & baz) #BRACE x y) + +* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses + as `(#QUOTE & foo)` instead; the operand of `#QUOTE` is the entire cdr. + + The same principle is used when parsing other sugar; some examples follow: + + Incorrect Correct + + #(x y z) -> (#HASH (x y z)) #(x y z) -> (#HASH x y z) + + [x y z] -> (#SQUARE (x y z)) [x y z] -> (#SQUARE x y z) + + #{x} -> (#HASH (#BRACE (x))) #{x} -> (#HASH #BRACE x) + + foo(x y) -> (#JOIN foo (x y)) foo(x y) -> (#JOIN foo x y) + +* Runes are case-sensitive, and the parser always emits runes using upper-case + letters when expressing syntax sugar. Uppercase rune names are reserved for + Zisp's internal use and standard library; users can use lowercase runes with + custom meaning without worrying about clashes. + + diff --git a/docs/c1/2-decode.md b/docs/c1/2-decode.md new file mode 100644 index 0000000..0b34204 --- /dev/null +++ b/docs/c1/2-decode.md @@ -0,0 +1,44 @@ +# Decoding + +A separate process called "decoding" can transform simple data structures, +consisting of only the datum types, into a richer set of Zisp types. + +For example, the decoder may turn `(#HASH ...)` into a vector, as one would +expect a vector literal like `#(...)` to work in Scheme. Bytevector syntax +could use a custom rune as a list prefix, like: `#u8(...)` + +Runes may be decoded in isolation as well, rather than transforming a list +whose head they appear in. This can implement Boolean constants as `#true` +and `#false` or `#t` and `#f`. + +The decoder recognizes `(#QUOTE ...)` to aid in implementing the traditional +quoting mechanism of Lisp/Scheme, but with a significant difference: + +Traditional quote is "unhygienic" in Scheme terms. An expression such as +`'(foo bar)` will always be read as `(quote (foo bar))` regardless of what +lexical context it appears in, so the semantics will depend on whatever the +identifier `quote` is bound to, meaning that the expression may end up +evaluating to something other than the list `(foo bar)`. + +The Zisp decoder, which transforms not datum to datum, but object to object, +can turn `#QUOTE` into an object which encapsulates the notion of quoting, +which the Zisp evaluator can recognize and act upon, ensuring that an +expression like `'(foo bar)` always turns into the list `(foo bar)`. + +One way to think about this, in Scheme (R6RS / syntax-case) terms, is that +expressions like `'(foo bar)` turn directly into a syntax object when read, +and the created syntax object begins with an identifier bound to `quote` in +the standard library. + +The decoder is, of course, configurable and extensible. The transformations +mentioned above would be performed only when it's told to decode data which +represents Zisp code. The decoder may be given a different configuration, +telling it to decode, for example, data which represents a different kind of +domain-specific data, such as application settings, build system commands, +complex data records with non-standard data types, and so on. + + diff --git a/docs/decoder.md b/docs/decoder.md deleted file mode 100644 index 0b34204..0000000 --- a/docs/decoder.md +++ /dev/null @@ -1,44 +0,0 @@ -# Decoding - -A separate process called "decoding" can transform simple data structures, -consisting of only the datum types, into a richer set of Zisp types. - -For example, the decoder may turn `(#HASH ...)` into a vector, as one would -expect a vector literal like `#(...)` to work in Scheme. Bytevector syntax -could use a custom rune as a list prefix, like: `#u8(...)` - -Runes may be decoded in isolation as well, rather than transforming a list -whose head they appear in. This can implement Boolean constants as `#true` -and `#false` or `#t` and `#f`. - -The decoder recognizes `(#QUOTE ...)` to aid in implementing the traditional -quoting mechanism of Lisp/Scheme, but with a significant difference: - -Traditional quote is "unhygienic" in Scheme terms. An expression such as -`'(foo bar)` will always be read as `(quote (foo bar))` regardless of what -lexical context it appears in, so the semantics will depend on whatever the -identifier `quote` is bound to, meaning that the expression may end up -evaluating to something other than the list `(foo bar)`. - -The Zisp decoder, which transforms not datum to datum, but object to object, -can turn `#QUOTE` into an object which encapsulates the notion of quoting, -which the Zisp evaluator can recognize and act upon, ensuring that an -expression like `'(foo bar)` always turns into the list `(foo bar)`. - -One way to think about this, in Scheme (R6RS / syntax-case) terms, is that -expressions like `'(foo bar)` turn directly into a syntax object when read, -and the created syntax object begins with an identifier bound to `quote` in -the standard library. - -The decoder is, of course, configurable and extensible. The transformations -mentioned above would be performed only when it's told to decode data which -represents Zisp code. The decoder may be given a different configuration, -telling it to decode, for example, data which represents a different kind of -domain-specific data, such as application settings, build system commands, -complex data records with non-standard data types, and so on. - - diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..ca8d814 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,36 @@ +# Zisp Manual + +This document explains the Zisp language and its implementation. + +Zisp intentionally blurs the line between developers and users of the +language. After all, Zisp is software, and its users are software +developers; the easiest way to explain *why* Zisp does certain things +is often to explain *how* it does them. + +That doesn't mean this manual will walk you through the source code +line by line. Instead, consider it a documentation of the code base +at large, doubling as a reference to the language implemented by the +code base. + +## Table of Contents + +1. [Chapter 1: Genesis](c1/) + + This chapter goes through the processes involved in reading source + files and ultimately producing binaries from them. + + 1. [Parse](c1/1-parse.html) + 2. [Decode](c1/2-decode.html) + 3. [Expand](c1/3-expand.html) + 4. [Execute](c1/4-execute.html) + 5. [Compile](c1/5-compile.html) + +2. [Chapter 2: Types](c2/) + + Following is an enumeration of the standard data types, and the + methods Zisp offers for generating new types. + + 1. ... + 2. ... + +3. [Chapter 3: ...] diff --git a/docs/parser.md b/docs/parser.md deleted file mode 100644 index a23ebbc..0000000 --- a/docs/parser.md +++ /dev/null @@ -1,117 +0,0 @@ -# Parser for Code & Data - -Zisp s-expressions are defined in terms of an extremely minimal set of data -types; only that which is necessary to build representations of more complex -expressions and data types: - - +--------+-----------------+--------+----------+------+ - | TYPE | String | Rune | Pair | Nil | - +--------+-----------------+--------+----------+------+ - | E.G. | foo, |foo bar| | #name | (X & Y) | () | - +--------+-----------------+--------+----------+------+ - -Note that the ampersand replaces the period in pair notation. This simplifies -the grammar: periods are a regular constituent of strings, while the ampersand -cannot appear in unquoted strings. - -The parser can also output non-negative integers, but this is only used for -datum labels; number literals are handled by the *decoder*. - -The parser recognizes various "syntax sugar" and transforms it into uses of the -above data types. The most ubiquitous example is of course the list: - - (datum1 datum2 ...) -> (datum1 & (datum2 & (... & ()))) - -The following table summarizes the other supported transformations: - - "xyz" -> (#QUOTE & |xyz|) #datum -> (#HASH & datum) - - [...] -> (#SQUARE ...) #rune(...) -> (#rune ...) - - {...} -> (#BRACE ...) dat1dat2 -> (#JOIN dat1 & dat2) - - 'datum -> (#QUOTE & datum) dat1.dat2 -> (#DOT dat1 & dat2) - - `datum -> (#GRAVE & datum) dat1:dat2 -> (#COLON dat1 & dat2) - - ,datum -> (#COMMA & datum) #%hex% -> (#LABEL & hex) - - #%hex=datum -> (#LABEL hex & datum) - -A separate process called *decoding* can transform such data into more complex -types. For example, `(#HASH x y z)` could be decoded into a vector, so the -expression `#(x y z)` works just like in Scheme. - -Decoding also resolves datum labels, goes over strings to find ones that are -actually a number literal, and takes care of a number of other transformations. -This offloads complexity, allowing the parser to remain extremely simple. See -the dedicated documentation of the decoder for more. - -Further notes about the syntax sugar table and examples above: - -* The terms datum, dat1, and dat2 each refer to an arbitrary datum; ellipsis - means zero or more data; hex is a hexadecimal number of up to 12 digits. - -* The `#datum` form only applies when the datum following the hash sign is a - list, quoted string, quote expression, another expression starting with the - hash sign, or a pipe-quoted string (see next). A bare string can follow the - hash sign by separating the two with a backslash: `#\string` - -* Strings can be quoted with pipes, like symbols in Scheme: - - |foo bar baz| - -* Though not represented in the table due to notational difficulty, the form - `#rune(...)` doesn't require a list in the second position; any datum that - works with the `#datum` syntax also works with `#rune`. - - #rune1#rune2 -> (#rune1 & #rune2) - - #rune"text" -> (#rune & "text") - - #rune\string -> (rune & string) - - #rune'string -> (#rune #QUOTE & string) - - As a counter-example, following a rune immediately with a bare string isn't - possible without the delimiting backslash, since that would be ambiguous: - - #abcdefgh ;Could be (#abcdef & gh) or (#abcde & fgh) or ... - -* Syntax sugar can combine arbitrarily; some examples follow: - - #{...} -> (#HASH #BRACE ...) - - #'foo -> (#HASH #QUOTE & foo) - - ##'[...] -> (#HASH #HASH #QUOTE #SQUARE ...) - - {x y}[i j] -> (#JOIN (#BRACE x y) #SQUARE i j) - - foo.bar.baz{x y} -> (#JOIN (#DOT (#DOT foo & bar) & baz) #BRACE x y) - -* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses - as `(#QUOTE & foo)` instead; the operand of `#QUOTE` is the entire cdr. - - The same principle is used when parsing other sugar; some examples follow: - - Incorrect Correct - - #(x y z) -> (#HASH (x y z)) #(x y z) -> (#HASH x y z) - - [x y z] -> (#SQUARE (x y z)) [x y z] -> (#SQUARE x y z) - - #{x} -> (#HASH (#BRACE (x))) #{x} -> (#HASH #BRACE x) - - foo(x y) -> (#JOIN foo (x y)) foo(x y) -> (#JOIN foo x y) - -* Runes are case-sensitive, and the parser always emits runes using upper-case - letters when expressing syntax sugar. Uppercase rune names are reserved for - Zisp's internal use and standard library; users can use lowercase runes with - custom meaning without worrying about clashes. - - -- cgit v1.2.3