diff options
Diffstat (limited to 'doc')
| -rw-r--r-- | doc/0/0-value.md | 199 | ||||
| -rw-r--r-- | doc/0/1-parse.md (renamed from doc/c1/1-parse.md) | 219 | ||||
| -rw-r--r-- | doc/0/2-decode.md (renamed from doc/c1/2-decode.md) | 2 | ||||
| -rw-r--r-- | doc/0/grammar/abnf.txt (renamed from doc/c1/grammar/abnf.txt) | 6 | ||||
| -rw-r--r-- | doc/0/grammar/index.md (renamed from doc/c1/grammar/index.md) | 0 | ||||
| -rw-r--r-- | doc/0/grammar/peg.txt (renamed from doc/c1/grammar/peg.txt) | 10 | ||||
| -rw-r--r-- | doc/0/grammar/zbnf.txt (renamed from doc/c1/grammar/zbnf.txt) | 10 | ||||
| -rw-r--r-- | doc/0/index.md (renamed from doc/c1/index.md) | 24 | ||||
| -rw-r--r-- | doc/index.md | 42 |
9 files changed, 350 insertions, 162 deletions
diff --git a/doc/0/0-value.md b/doc/0/0-value.md new file mode 100644 index 0000000..4bb7e0c --- /dev/null +++ b/doc/0/0-value.md @@ -0,0 +1,199 @@ +# NaN-packed Value representation + +The format of a binary64 floating-point number, in big-endian notation: + + { sign: 1 bit, exponent: 11 bits, fraction: 52 bits } + +When the 11 exponent bits are all set, it's either a NaN or an Infinity. + +For value packing, the remaining 53 bits are available, giving us `2^53` values +minus the following four bit patterns: + + *** FORBIDDEN BIT-PATTERNS *** + + 1. Negative cqNaN :: { sign = 1, exponent = MAX, fraction = 10000... } + + 2. Negative Infinity :: { sign = 1, exponent = MAX, fraction = 00000... } + + 3. Positive cqNaN :: { sign = 0, exponent = MAX, fraction = 10000... } + + 4. Positive Infinity :: { sign = 0, exponent = MAX, fraction = 00000... } + +The abbreviation "cqNaN" stands for canonical quiet NaN. + +The MSb of the fraction is also called the `is_quiet` flag, because it marks a +NaN as being "quiet" rather than signaling. The rest of the fraction being all +zero makes it the *canonical* quiet NaN for the given sign value. + +The positive and negative cqNaN are the *only* NaN values that can actually be +returned by FP operations. This is convenient, because it means we can simply +use them to represent themselves in Zisp. + +Infinity values may also be returned by FP operations, and we want them to also +exit in Zisp, so they also represent themselves. + +Beyond those four bit patterns, all values with a maximum exponent (all bits +set) are fair game for representing other values, so `2^53 - 4` possibilities. + +We split those `2^53 - 4` available values into four groups, each allowing for +`2^51 - 1` different values to be encoded. (51-bit values excluding zero.) + + sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1 + + sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2 + + sign = 0, quiet = 1 :: Pointers and various immediates + + sign = 0, quiet = 0 :: Internal use by interpreter + + +## Fixnums + +Negative fixnums actually represent themselves, without needing to go through +any transformation. Only the smallest 52-bit signed negative, `-2^51`, cannot +be represented, as it would step on Forbidden Value #1, Negative cqNaN. + +Positive fixnums go through a bitsiwe NOT (which can be implemented as an XOR +mask combining it with removal of NaN-related high bits) to avoid the all-zero +payload value, which would step on Forbidden Value #2, Negative Infinity. + + +## Pointers and immediates + +This region of 51-bit values is divided as follows, based on the three highest +bits, providing a payload value of 48 bits for each. + + 000 :: Pointer to heap object (type-tagged) + + 001 :: Pointer to list values (length-tagged) + + 010 :: Pointer to istr object + + 011 :: Immediate short string + + 100 :: Immediate small rational (sign bit 0) + + 101 :: Immediate small rational (sign bit 1) + + 110 :: Undefined + + 111 :: Immediate types further subdivided as follows: + + 0....... 0....... 0....... (etc.) :: Rune + + 1....... :: 128 40-bit types + + 0....... 1....... :: 16384 32-bit types + + 0....... 0....... 1....... :: 2097152 24-bit types + + (etc.) + +Forbidden Value #3, Positive cqNaN, is avoided by not using 0 as a valid type +tag value for heap pointers. + +### Type-tagged heap pointers + +Regular heap objects are allocated with 16-byte alignment, meaning the lowest +four bits are naturally zero. We exploit this by shifting down the address by +four bits, making room for more tag bits immediately following the 16 high bits +that mark the value as a heap pointer. + +Thus, a value can be checked against a specific heap object type by comparing +the 20 high bits to a combined constant value: the 16 highest bits indicating +that it's a heap pointer, and the 4 bits after that denoting the heap type. + +### Length-tagged list pointers + +Lists are arrays of Value objects, allocated without padding (8-byte alignment) +for efficient source code representation and traversal. Since the lowest three +bits are naturally zero, we use them as a 3-bit length information tag. + +A length tag value of 1 to 7 means there are exactly that many Value objects +starting at the address, while 0 means there is an array of at least eight, +terminated with a special 64-bit sentinel bit-pattern that is not otherwise +valid as a Zisp value. + +### Interned string pointers + +Interned string (istr) objects may be unaligned, so the low bits of the pointer +are not used for any special purpose. Having a separate category for this type +of pointer also streamlines the interpreter implementation + +### Short strings + +This 48-bit range is used for strings of zero to six bytes. These are NUL +terminated unless exactly six bytes, meaning that a literal NUL byte cannot +appear in them, but otherwise they allow arbitrary byte values. + +### Small rationals + +We use a 49-bit space for small rational numbers, with a signed 25-bit two's +complement integer numerator, and 24-bit unsigned integer denominator. + +### Runes and other small values + +Runes are symbols of up to 6 ASCII characters in length, used to implement +extensible reader syntax. (See Zisp decoder.) They cannot contain the NUL +byte, as they are NUL-terminated unless exactly six ASCII bytes in length. + +NOTE: The order in which the characters of the rune are encoded depends on +endianness. On little-endian systems (i.e. most modern architectures) the +characters will be in "reverse" order, with the first character in lowest +position, so the terminating NUL has to be searched from low to high. + +The fact that runes are limited to ASCII bytes, whose MSb is unset, opens up +some space for other small values to co-inhabit the same 48-bit value range. + +We divide this space into increasingly many potential types, with smaller and +smaller payloads, where the highest byte with a non-zero MSb determines which +size category we're in: If the highest byte has its MSb set, then the other +seven bits are a type tag, and each type has a 40-bit payload; if the second +highest byte has its MSb set, then the 14 non-MSb bits of the two high bytes +define the type, and each has a 32-bit payload; and so on. + +Unicode code points need 21 bits, so we use a 24-bit type for the Character +type. Miscellaneous values like True, False, EOF, etc. are placed in an 8-bit +type, since there will never be that many of them. + +A virtually unlimited number of user-defined enum types can fit into the types +with small payload values here: There is room for over 268 Million 16-bit types +(28-bit type tag) and over 34 Billion 8-bit types (35-bit type tag). + + +## Internal use values + +The final 51-bit range is used for various internal purposes by the Zisp +interpreter, mostly related to transparent code optimization. + + 000 :: Pointer to heap object as constant + + 001 :: Pointer to list values as constant + + 010 :: Pointer to istr object as constant + + 011 :: Immediate short string as constant + + 100 :: Local variable reference by index + + 101 :: Pointer to constant function-call expression + + 110 :: Pointer to variable function-call expression + + 111 :: Pointer to special-form or macro-call expression + +Forbidden Value #4, Positive Infinity, is avoided thanks to the fact that heap +pointers always have a non-zero heap-type tag. (See further above.) + +The first four categories simply mirror those of the previous 51-bit range, but +mark the values as being constants rather than code to evaluate. + +The remaining four categories are somewhat similar to VM instructions. + + + +<!-- +;; Local Variables: +;; fill-column: 80 +;; End: +--> diff --git a/doc/c1/1-parse.md b/doc/0/1-parse.md index d4c4c2e..101a3b6 100644 --- a/doc/c1/1-parse.md +++ b/doc/0/1-parse.md @@ -5,31 +5,16 @@ Zisp s-expressions represent an extremely minimal set of data types; only that which is necessary to strategically construct more complex values: - +-------+---------+--------+----------+------+ - | TYPE | String | Rune | Pair | Nil | - +-------+---------+--------+----------+------+ - | E.G. | foobar | #name | (X & Y) | () | - +-------+---------+--------+----------+------+ + +---------+--------+----------+------+ + | String | Rune | List | Nil | + +---------+--------+----------+------+ + | foobar | #name | (X ...) | () | + +---------+--------+----------+------+ The parser recognizes various *syntax sugar* which abbreviates verbose syntax, -and may result in special data structures (typically, a pair with a rune in its -first, and payload in its second position) which another Zisp component called -the *decoder* can transform into a rich set of value types. - -The most ubiquitous syntax sugar is the list, which abbreviates a sequence of -tail-linked pairs, terminated with a special nil value represented as `()`: - - (x) -> (x & ()) - - (x y) -> (x & (y & ())) - - (x y z) -> (x & (y & (z & ()))) - -The following are so-called *improper lists*, ending in a non-nil value: - - (x y & z) -> (x & (y & z)) - - (x y z & t) -> (x & (y & (z & t))) +and may result in special data structures (typically, a list with a rune in its +first position) which another Zisp component called the *decoder* can transform +into a rich set of value types. More details about syntax sugar, and the decoder, are explained later. @@ -121,8 +106,8 @@ to have an external representation. After parsing, it needs to be *decoded* to actually become the expected value. One may speak of an *external representation of a value* where the value is not -itself a datum, but can be encoded as one. The more strictly correct term for -this is: "The external representation of a datum that encodes the value." +itself a datum, but has an encoding as one. The more strictly correct term for +this is: "The external representation of the datum encoding the value." ### Syntax sugar @@ -137,7 +122,7 @@ their memory address. An example of syntax sugar that is not a mere abbreviation is a quoted string which contains bytes that could not appear in a *bare* string: - "foo bar" -> (#DQUOTE & <STRING>) + "foo bar" -> (#DQUOTE <STRING>) In this example, the visual token `<STRING>` represents the actual string value in program memory, which has no direct external representation in bytes because @@ -175,23 +160,23 @@ is only a *datum* if it adheres to additional constraints as explained below. ### String Strings can appear *bare* or be quoted in various ways. A quoted string is in -fact parsed into a pair value with a rune in the first position to identify the +fact parsed into a list value with a rune in the first position to identify the quotation variant that was parsed, and the string value in the second position; or, in case of at-quoted strings, a special construct we will look at later. - +-----------+-----------------------------+ - | Syntax | Parse output | - +-----------+-----------------------------+ - | |bytes| | (#PQSTR & <STRING>) | - +-----------+-----------------------------+ - | "bytes" | (#DQSTR & <STRING>) | - +-----------+-----------------------------+ - | @_bytes_ | (#ATSTR <BYTE> & <STRING>) | - +-----------+-----------------------------+ + +-----------+-------------------------------+ + | Syntax | Parse output | + +-----------+-------------------------------+ + | |bytes| | (#PQSTR <STRING>) | + +-----------+-------------------------------+ + | "bytes" | (#DQSTR <STRING>) | + +-----------+-------------------------------+ + | @_bytes_ | (#ATSTR <SENTINEL> <STRING>) | + +-----------+-------------------------------+ The visual token `<STRING>` denotes the actual string, as a Zisp value, in the -second position of the pair. The visual token `<BYTE>` stands for an integer -Zisp value between 0 and 255. +second position of the list. The visual token `<SENTINEL>` stands for a Zisp +integer value between 0 and 254. These external representations of strings will be explained in more detail further below, including backslash escape sequences allowed within, and how @@ -206,8 +191,10 @@ occurrence of the same string -- equal in length and containing the same byte values -- ends up being represented by the same bit-pattern; either a memory address, or an immediate representation within a CPU word for short strings. The quotation method is inconsequential to this process; for example, while -`|foobar|` and `"foobar"` will parse into different pair values, the actual -string they hold will be the same one in program memory. +`|foo bar|` and `"foo bar"` will parse into different list values, the actual +string they hold a reference to will be the same one in program memory. This +behavior is however configurable and can be disabled entirely for cases where +large numbers of arbitrary binary strings are being parsed. Strings of length greater than 255 bytes are stored separately in memory, even if they are equal in length and content. @@ -232,30 +219,32 @@ default decoder settings and documented explicitly as such. Runes are always stored directly in a CPU word; never by memory address. -### Pair +### List -A pair is a tuple of two values: the first value and the second value. In Lisp -tradition, these are also called the `car` and `cdr` of the pair, respectively. +A list is a contiguous array of one or more values in memory, whose length may +be encoded directly within the pointer to the head of the array, or else the +array is terminated with a special sentinel bit-pattern that is not otherwise +valid as a Zisp value. -The parser allocates a unique two-word cell in program memory for every pair, -and represents that pair through the memory address of the cell. +The parser allocates a unique array in program memory for every list, and the +list as a value is then represented by the memory address of that array, with +either an exact length tag or a tag indicating that it's sentinel-terminated. -Pairs are valid data if one of the following holds true: +Lists are valid data if one of the following holds true: -* The pair encodes a quoted string, datum label, or shebang line. +* The list encodes a quoted string, datum label, or shebang line. -* Both the first and second value in the pair is a valid datum. +* All values in the list are a valid datum. -Further, a structure of nested pair values may not contain cyclic references +Further, a structure of nested list values may not contain cyclic references back up in the structure (which would make the above definition diverge into -infinity). Such cycles must be broken up with datum labels, or else the pair +infinity). Such cycles must be broken up with datum labels, or else the list cannot be considered a datum, since it cannot be printed or parsed. ### Nil -The Zisp nil value is a singleton and a datum. There is exactly one nil value -and it is used to terminate a chain of pairs representing a list of values; it -has the external representation `()`. +The Zisp nil value is a singleton and a datum. There is exactly one nil value, +used in lieu of a list of zero length; it has the external representation `()`. ## Quoted strings @@ -266,26 +255,25 @@ This section goes into the details of each variant. ### Pipe-quoted Strings can be quoted with pipes, like symbols in R7RS Scheme, which triggers -the parser to generate a pair with the structure: +the parser to generate a list with the structure: - (#PQSTR & <STRING>) ;; <STRING> is visual aid, not syntax + (#PQSTR <STRING>) ;; <STRING> is visual aid, not syntax The decoder, using default settings, would emit this string verbatim as a value. Then, during code evaluation, this would be seen as an identifier. In this way, pipe-quoted strings are equivalent to bare strings in functionality. It is important to understand that the decoder sits between the parser and the -[evaluator](3-execute.html), and in opposition to Lisp and Scheme tradition, it -is common for the evaluator to receive values that are not valid as a datum; in -this case, a string unto itself that may not be a valid datum, due to not being -possible to be represented as a bare string. Yet, it is valid as an identifier -for the purposes of the evaluator, since it is a string *value* like any other. +[evaluator](3-eval.html), and in opposition to Lisp and Scheme tradition, it is +common for the evaluator to receive values that are not valid as a datum; here, +a string unto itself that may not be a valid datum. Yet, it is valid as an +identifier for the purposes of the evaluator. ### Double-quoted Strings wrapped in the double-quote symbol parse into: - (#DQSTR & <STRING>) ;; <STRING> is visual aid, not syntax + (#DQSTR <STRING>) ;; <STRING> is visual aid, not syntax Under default settings, the decoder would transform this into a value which, when evaluated as code, simply yields the contained string as a value. @@ -297,14 +285,15 @@ escapes nor any other kind of escape sequence are recognized within them. The syntax begins with an at sign, followed by any byte. That byte becomes a termination marker, and the string cannot contain an occurrence of it, since -there are no escape sequences. +there are no escape sequences. The byte value 255 has a special meaning; see +further below. - @"foo \ bar" -> (#ATSTR <BYTE> & <STRING>) + @"foo \ bar" -> (#ATSTR <SENTINEL> <STRING>) -In the above, the visual tokens `<BYTE>` and `<STRING>` represent an integer -value and a string value, respectively. In this example, the integer value -would be 34; the ASCII value for the double-quote sign. The string value -contains a literal backslash, since there is no backslash escape parsing. +The visual tokens `<SENTINEL>` and `<STRING>` represent an integer and string +value, respectively. Here, the integer would be 34, which is the ASCII value +for a double-quote sign. The string contains a literal backslash, since there +is no backslash escape parsing. This style of quoting can be useful, for instance, when representing regular expressions as strings in code: @@ -325,6 +314,19 @@ valid ASCII byte; it can be absolutely any byte value, even NUL. This can be useful to easily encode binary data which is known to not contain a specific byte; an example would be C strings which cannot contain NUL. +If however the byte value is 255, then it does not stand for a sentinel, but +rather indicates that 6 more bytes follow, interpreted as a big-endian 48-bit +integer, which is the count of bytes making up the contents of the string. + +Example sequence of bytes, represented as a mixture of ASCII and raw integers: + + '@' 255 0 0 0 0 2 100 <612 bytes> -> (#ATSTR <STRING>) + +One may ask why the length is not included in the list. This is unnecessary, +since strings in Zisp already carry length information in their own metadata +structure. + + ### Backslash escapes In pipe-quoted and double-quoted strings, the following ASCII characters may @@ -360,7 +362,7 @@ follow a backslash to insert a certain character. In words: -* A backslash followed by a backslash, pipe, or double-quote character is +* A backslash, followed by a backslash, pipe, or double-quote character, is substituted with a literal occurrence of that character. * The characters 0, a, b, t, n, v, f, r, and e have the same meanings as in the @@ -481,15 +483,15 @@ need to use it. The following table summarizes commonly useful syntax abbreviations: - [...] -> (#SQUARE ...) #datum -> (#HASH & datum) + [...] -> (#SQUARE ...) #datum -> (#HASH datum) - {...} -> (#BRACE ...) #rune(...) -> (#rune ...) + {...} -> (#BRACE ...) #rune(...) -> (#rune ...) - 'datum -> (#QUOTE & datum) dat1dat2 -> (#JOIN dat1 & dat2) + 'datum -> (#QUOTE datum) dat1dat2 -> (#JOIN dat1 dat2) - `datum -> (#GRAVE & datum) dat1.dat2 -> (#DOT dat1 & dat2) + `datum -> (#GRAVE datum) dat1.dat2 -> (#DOT dat1 dat2) - ,datum -> (#COMMA & datum) dat1:dat2 -> (#COLON dat1 & dat2) + ,datum -> (#COMMA datum) dat1:dat2 -> (#COLON dat1 dat2) Notes: @@ -501,53 +503,38 @@ Notes: with a rune literal. A bare string can nevertheless follow the hash sign by separating the two with a backslash: - #\string -> (#HASH & string) + #\string -> (#HASH string) * Though not represented in the table due to notational difficulty, the form `#rune(...)` doesn't require a list in the second position; any datum that works with the `#datum` syntax also works with `#rune<DATUM>`. - #rune1#rune2 -> (#rune1 & #rune2) + #rune1#rune2 -> (#rune1 #rune2) - #rune\string -> (rune & string) + #rune\string -> (#rune string) - #rune'string -> (#rune #QUOTE & string) + #rune'string -> (#rune (#QUOTE string)) - #rune"string" -> (#rune #DQSTR & |string|) + #rune"string" -> (#rune (#DQSTR |string|)) As a counter-example, following a rune immediately with a bare string isn't possible without the delimiting backslash, since that would be ambiguous: - #abcdefgh ;Could be (#abcdef & gh) or (#abcde & fgh) or ... + #abcdefgh ;Could be (#abcdef gh) or (#abcde fgh) or ... * Syntax sugar can combine arbitrarily. Some examples follow. Any of these may - or may not actually have a meaning in code; many could simply end up producing + or may not actually have a meaning in code; some might simply end up producing an error during decoding, or later evaluation of code. - #{...} -> (#HASH #BRACE ...) - - #'foo -> (#HASH #QUOTE & foo) - - ##'[...] -> (#HASH #HASH #QUOTE #SQUARE ...) - - {x y}[i j] -> (#JOIN (#BRACE x y) #SQUARE i j) - - foo.bar.baz{x y} -> (#JOIN (#DOT (#DOT foo & bar) & baz) #BRACE x y) - -* While in Lisp and Scheme `'foo` parses as `(quote foo)`, in Zisp it parses as - `(#QUOTE & foo)`; a single pair with the quoted datum in the second position. - - The same principle is used when parsing other sugar; some examples follow: - - Incorrect Correct + #{...} -> (#HASH (#BRACE ...)) - #(x y z) -> (#HASH (x y z)) #(x y z) -> (#HASH x y z) + #'foo -> (#HASH (#QUOTE foo)) - [x y z] -> (#SQUARE (x y z)) [x y z] -> (#SQUARE x y z) + ##'[...] -> (#HASH (#HASH (#QUOTE (#SQUARE ...)))) - #{x} -> (#HASH (#BRACE (x))) #{x} -> (#HASH #BRACE x) + {x y}[i j] -> (#JOIN (#BRACE x y) (#SQUARE i j)) - foo(x y) -> (#JOIN foo (x y)) foo(x y) -> (#JOIN foo x y) + foo.bar.baz{x y} -> (#JOIN (#DOT (#DOT foo bar) baz) (#BRACE x y)) * Those used to thinking in Lisp and Scheme may think that `(#QUOTE ...)` halts further decoding of enclosed data. This is not so, since quoting is related @@ -562,13 +549,13 @@ labels in the data encoding of the value. A datum label either wraps another datum to assign a number to it, or contains just a reference to a previous assignment. - +------------------+------------------------------+ - | Syntax | Internal datum structure | - +------------------+------------------------------+ - | #%<HEX>=<DATUM> | (#LABEL <NUMBER> & <DATUM>) | - +------------------+------------------------------+ - | #%<HEX>% | (#LABEL & <NUMBER>) | - +------------------+------------------------------+ + +------------------+----------------------------+ + | Syntax | Internal datum structure | + +------------------+----------------------------+ + | #%<HEX>=<DATUM> | (#LABEL <NUMBER> <DATUM>) | + +------------------+----------------------------+ + | #%<HEX>% | (#LABEL <NUMBER>) | + +------------------+----------------------------+ In this visual, the token `<HEX>` stands for a hexadecimal digit sequence, the token `<DATUM>` stands for any other datum, and `<NUMBER>` is a stand-in for a @@ -576,13 +563,13 @@ number value; that which is represented by `<HEX>`. For clarity, concrete examples follow: - +-------------------+-------------------------------+ - | Byte sequence | Parse result | - +-------------------+-------------------------------+ - | #%1234abcd=(foo) | (#LABEL <0x1234abcd> & (foo)) | - +-------------------+-------------------------------+ - | #%1234abcd% | (#LABEL & <0x1234abcd>) | - +-------------------+-------------------------------+ + +-------------------+------------------------------+ + | Byte sequence | Parse result | + +-------------------+------------------------------+ + | #%1234abcd=(foo) | (#LABEL <0x1234abcd> (foo)) | + +-------------------+------------------------------+ + | #%1234abcd% | (#LABEL <0x1234abcd>) | + +-------------------+------------------------------+ Here, the visual token `<0x1234abcd>` stands for a Zisp value of a numeric type with an integer value. Note that the decoder may not accept a bare string here, @@ -593,9 +580,9 @@ meaning this syntax sugar is not merely an abbreviation. Finally, the parser recognizes the Unix *shebang* syntax and outputs a datum to hold the string values found within: - #!interpreter -> (#SHBANG & interpreter) + #!interpreter -> (#SHBANG interpreter) - #!interpreter argline -> (#SHBANG interpreter & argline) + #!interpreter argline -> (#SHBANG interpreter argline) When executing a script file, Zisp simply stores this into a global value that may be inspected if desired. diff --git a/doc/c1/2-decode.md b/doc/0/2-decode.md index 379c74b..1a45824 100644 --- a/doc/c1/2-decode.md +++ b/doc/0/2-decode.md @@ -39,6 +39,6 @@ complex data records with non-standard data types, and so on. <!-- ;; Local Variables: -;; fill-column: 77 +;; fill-column: 80 ;; End: --> diff --git a/doc/c1/grammar/abnf.txt b/doc/0/grammar/abnf.txt index aa67646..5ab3c89 100644 --- a/doc/c1/grammar/abnf.txt +++ b/doc/0/grammar/abnf.txt @@ -90,7 +90,7 @@ JoinExpr = Datum RJoinDatum / NoEndDot "." Datum -BareChar = "!" / "$" / "%" / "*" / "/" / "<" / "=" / ">" +BareChar = "!" / "$" / "%" / "&" / "*" / "/" / "<" / "=" / ">" / "?" / "^" / "_" / "~" / ALPHA Numeric = "+" / "-" / DIGIT @@ -108,9 +108,7 @@ StringEsc = "\" / "|" / DQUOTE / *( HTAB / SP ) LF *( HTAB / SP ) / %s"u" ["0"] 1*5HEXDIG ";" / %s"u" "1" "0" 4HEXDIG ";" -List = [ Unit *( Blank Unit ) ] *Blank [Tail] [SkipUnit] - -Tail = "&" Unit *Blank +List = [ Unit *( Blank Unit ) ] *Blank [SkipUnit] RuneName = ALPHA *5( ALPHA / DIGIT ) diff --git a/doc/c1/grammar/index.md b/doc/0/grammar/index.md index e3716ea..e3716ea 100644 --- a/doc/c1/grammar/index.md +++ b/doc/0/grammar/index.md diff --git a/doc/c1/grammar/peg.txt b/doc/0/grammar/peg.txt index 7b28a99..1541da6 100644 --- a/doc/c1/grammar/peg.txt +++ b/doc/0/grammar/peg.txt @@ -29,7 +29,7 @@ BareString <- SpecBareChar ( BareChar / JoinChar )* SpecBareChar <- '+' / '-' / JoinChar / DIGIT BareChar <- ALPHA / DIGIT - / '!' / '$' / '%' / '*' / '+' / '-' / '/' + / '!' / '$' / '%' / '&' / '*' / '+' / '-' / '/' / '<' / '=' / '>' / '?' / '^' / '_' / '~' @@ -65,11 +65,9 @@ Label <- HEXDIG+ Rune <- ALPHA ( ALPHA / DIGIT )* -ParenList <- '(' ListBody ')' -SquareList <- '[' ListBody ']' -BraceList <- '{' ListBody '}' - -ListBody <- Unit* ( Blank* '&' Unit )? Blank* +ParenList <- '(' Unit* ')' +SquareList <- '[' Unit* ']' +BraceList <- '{' Unit* '}' DIGIT <- [0-9] diff --git a/doc/c1/grammar/zbnf.txt b/doc/0/grammar/zbnf.txt index 923ac83..c04b813 100644 --- a/doc/c1/grammar/zbnf.txt +++ b/doc/0/grammar/zbnf.txt @@ -29,7 +29,7 @@ BareString : SpecBareChar ( BareChar | JoinChar )* SpecBareChar : '+' | '-' | JoinChar | DIGIT BareChar : ALPHA | DIGIT - | '!' | '$' | '%' | '*' | '+' | '-' | '/' + | '!' | '$' | '%' | '&' | '*' | '+' | '-' | '/' | '<' | '=' | '>' | '?' | '^' | '_' | '~' @@ -65,11 +65,9 @@ Label : HEXDIG+ Rune : ALPHA ( ALPHA | DIGIT )* -ParenList : '(' ListBody ')' -SquareList : '[' ListBody ']' -BraceList : '{' ListBody '}' - -ListBody : Unit* [ Blank* '&' Unit ] Blank* +ParenList : '(' Unit* ')' +SquareList : '[' Unit* ']' +BraceList : '{' Unit* '}' ;; Local Variables: diff --git a/doc/c1/index.md b/doc/0/index.md index 6cec369..f0da216 100644 --- a/doc/c1/index.md +++ b/doc/0/index.md @@ -1,7 +1,13 @@ -# Chapter 1: Genesis +# Chapter 0: Genesis -This chapter goes through the processes involved in reading source -code, running it, and optionally compiling it. +This chapter explains the core value representation of Zisp, and goes +through the processes of parsing, decoding, running, and optionally +compiling code. + +0. [Value](0-value.html) + + Zisp uses a uniform 64-bit representation for all values, densely + packed into signaling or non-canonical NaN values. 1. [Parse](1-parse.html) (see also [grammar](grammar/)) @@ -12,15 +18,13 @@ code, running it, and optionally compiling it. The decoder runs configurable and extensible pre-processing steps over data received from the parser, enriching it with more complex - data types, and handling primitive source code transforms. It's - comparable to the C pre-processor or Lisp's `DEFMACRO` mechanism, - with a few more responsibilities, such as number literal parsing. + value types, and handling primitive source code transforms. -3. [Execute](3-execute.html) +3. [Eval](3-eval.html) - Code is executed (or interpreted, or evaluated) in an environment, - also called a module, which may be mutated, and linked with other - modules. Execution is immediate, without any pre-compilation. + Code is evaluated within a mutable module context, on which it can + have side effects such as creating new definitions or establishing + links to other modules. 4. [Compile](4-compile.html) diff --git a/doc/index.md b/doc/index.md index beaa78c..51b92fa 100644 --- a/doc/index.md +++ b/doc/index.md @@ -2,31 +2,35 @@ This document explains the Zisp language and its implementation. -Zisp intentionally blurs the line between developers and users of the -language. After all, Zisp is software, and its users are software -developers; the easiest way to explain *why* Zisp does certain things -is often to explain *how* it does them. +Zisp intentionally blurs the line between developers and users of the language. +After all, Zisp is software, and its users are software developers; the easiest +way to explain *why* Zisp does certain things is often to explain *how* it does +them. -That doesn't mean this manual will walk you through the source code -line by line. Instead, consider it a documentation of the code base -at large, doubling as a reference to the language implemented by the -code base. +That doesn't mean this manual will walk you through the source code line by +line. Instead, consider this a documentation of the implementation at large, +doubling as a language reference. ## Table of Contents -1. [Chapter 1: Genesis](c1/) +0. [Chapter 0: Genesis](./0/) - 1. [Parse](c1/1-parse.html) - 2. [Decode](c1/2-decode.html) - 3. [Execute](c1/3-execute.html) - 4. [Compile](c1/4-compile.html) + 0. [Value](./0/0-value.html) + 1. [Parse](./0/1-parse.html) + 2. [Decode](./0/2-decode.html) + 3. [Execute](./0/3-execute.html) + 4. [Compile](./0/4-compile.html) -2. [Chapter 2: Types](c2/) - - This chapter deals with the standard data types, and the methods - Zisp offers for defining new types. +1. [Chapter 1: Taxonomy](./1/) + 0. ... 1. ... - 2. ... -3. [Chapter 3: ...](c3/) +2. [Chapter 2: ...](./2/) + + +<!-- +;; Local Variables: +;; fill-column: 80 +;; End: +--> |
