diff options
Diffstat (limited to 'src')
| -rw-r--r-- | src/zisp/io/Parser.zig | 85 | ||||
| -rw-r--r-- | src/zisp/value.zig | 292 |
2 files changed, 179 insertions, 198 deletions
diff --git a/src/zisp/io/Parser.zig b/src/zisp/io/Parser.zig index ed55a36..dbcd6ad 100644 --- a/src/zisp/io/Parser.zig +++ b/src/zisp/io/Parser.zig @@ -1,55 +1,36 @@ -// === Syntax === -// -// See docs/parser.md and spec/syntax.bnf to understand the syntax. -// -// -// === Trampolining strategy === -// -// Instead of using recursion directly, the parser is written in a "trampoline" -// style, which ensures that parse depth is not limited by stack size. -// -// All state and context is passed around via a struct pointer. The parser has -// a main loop, which calls a function as dictated by parser.context.next, and -// the function may update the state to have another function called next. -// -// If a function wants to call a recursive subroutine, it pushes some of the -// current context onto a stack, including what function the subroutine should -// return to, and then updates the state to instruct the main loop to call one -// of the entry point subroutines. -// -// If a function wants to make the parser return, either from a subroutine or -// from the main loop, it sets the .result field, and tries to pop the saved -// context. If the context stack was empty, the main loop returns. -// -// -// === Buffering === -// -// For efficiency, call the parser on an input stream with implicit buffering. -// -// The parser does not use its own buffer, beyond one character that may be -// written back into the unread_char field, which is checked at the end to -// ensure it's nothing other than a trailing blank or comment. -// -// This lack of buffering is to ensure that the parser never reads more bytes -// from the input than what it needs to parse a datum. Consider the input: -// -// (a b c) (x y z) -// -// If we used proper buffering, like reading up to 4K bytes per read, then the -// whole stream would be consumed at once before it's parsed. Then, the parser -// would return the first datum, and the rest of the stream would be lost. The -// parser would need some way to reset the input stream's read head to the end -// of the first datum, but not all stream types may support this. -// -// Although in a scenario like this it would be wise to create a single Parser -// instance to call multiple times (in which case it could continue using its -// internal buffer from where it left), this may not always be practical. -// -// One could even have an input stream with Zisp s-expressions and other data -// intertwined, in which case this lack of buffering is also crucial, since one -// needs to alternate between calling the Zisp parser and some other parser on -// the same input stream. -// +//! +//! === Syntax === +//! +//! See docs/c1/1-parse.md to understand the implemented syntax. +//! +//! +//! === Trampolining strategy === +//! +//! Instead of using recursion directly, the parser is written in a "trampoline" +//! style, which ensures that parse depth is not limited by stack size. +//! +//! All state and context is passed around via a struct pointer. The parser has +//! a main loop, which calls a function as dictated by parser.context.next, and +//! the function may update the state to have another function called next. +//! +//! If a function wants to call a recursive subroutine, it pushes some of the +//! current context onto a stack, including what function the subroutine should +//! return to, and then updates the state to instruct the main loop to call one +//! of the entry point subroutines. +//! +//! If a function wants to make the parser return, either from a subroutine or +//! from the main loop, it sets the .result field, and tries to pop the saved +//! context. If the context stack was empty, the main loop returns. +//! +//! +//! === Buffering === +//! +//! For efficiency, call the parser on an input stream with implicit buffering. +//! +//! The parser does not use its own buffer, beyond one character that may be +//! written back into the unread_char field, which is checked at the end to +//! ensure it's nothing other than a trailing blank or comment. +//! const builtin = @import("builtin"); const std = @import("std"); diff --git a/src/zisp/value.zig b/src/zisp/value.zig index bcabdf8..e8813eb 100644 --- a/src/zisp/value.zig +++ b/src/zisp/value.zig @@ -1,149 +1,149 @@ -// -// === NaN Packing Strategy === -// -// Format of a double, in most to least significant field order: -// -// { sign: u1, exponent: u11, fraction: u52 } -// -// When the exponent bits are all set, it's either a NaN or an Infinity. -// -// For value packing, almost all remaining 53 bits are available, giving us -// about 2^53 values, except for the four following bit patterns: -// -// *** FORBIDDEN VALUES *** -// -// 1. Negative cqNaN = { sign = 1, exponent = max, fraction = 2^51 } -// -// 2. Negative Infinity = { sign = 1, exponent = max, fraction = 0 } -// -// 3. Positive cqNaN = { sign = 0, exponent = max, fraction = 2^51 } -// -// 4. Positive Infinity = { sign = 0, exponent = max, fraction = 0 } -// -// The abbreviation "cqNaN" stands for canonical quiet NaN. -// -// Note that 2^51 means the MSb of the 52 fraction bits being set, and the rest -// being zero. The fraction MSb is also called the is_quiet flag, because it -// demarcates quiet NaNs. The rest being zero makes it the canonical qNaN. -// -// The positive and negative cqNaN are the *only* NaN values that can actually -// be returned by FP operations, which is convenient because it means we can -// simply use them to represent themselves in Zisp. -// -// Infinity values may also be returned by FP operations, and we want them to -// exist in Zisp as doubles as well, so they also represent themselves. -// -// Beyond those four bit patterns, all values with a maximum exponent (all bits -// set) are fair game for representing other values, so 2^53 - 4 possibilities. -// -// We split those 2^53 - 4 available values into four groups, each allowing for -// 2^51 - 1 different values to be encoded. (51-bit values excluding zero.) -// -// sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1 -// -// sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2 -// -// sign = 0, quiet = 1 :: Pointers -// -// sign = 0, quiet = 0 :: Others -// -// -// === Fixnums === -// -// Negative fixnums actually represent themselves without needing to go through -// any transformation. Only the smallest 52-bit signed negative, -2^51, cannot -// be represented, as it would step on Forbidden Value #1, Negative cqNaN. -// -// Positive fixnums go through bitsiwe NOT (implemented via an XOR mask here to -// make it one operation together with the NaN masking) to avoid the all-zero -// payload value, which would step on Forbidden Value #2, Negative Infinity. -// -// -// === Pointers === -// -// Pointers are further subdivided as follows based on the remaining 51 bits, -// with the first three bits used as a sort of tag: -// -// 000 :: Regular pointer to Zisp heap object (string, vector, etc.) -// -// 001 :: Weak pointer to Zisp heap object -// -// 01. :: Undefined -// -// 1.. :: Undefined -// -// This means pointers to the Zisp heap are 48 bits. Of those, we only really -// need 45, since 64-bit platforms are in practice limited to 48-bit addresses, -// and Zisp heap allocations happen at 8-byte boundaries, meaning the lowest 3 -// bits are always unset. Thus, we are able to store yet another 3-bit tag in -// those 48-bit pointers alongside the actual, multiple-of-8, 48-bit address. -// -// Forbidden Value #3, Positive cqNaN, is avoided thanks to the fact that a -// regular Zisp heap pointer can never be null. Weak pointers, which can be -// null, avoid stepping on that forbidden value thanks to bit 49 being set. -// -// -// === Other values === -// -// This 51-bit range is divided as follows: -// -// 000 :: Subdivided as follows: -// -// 0....... 0....... 0....... (etc.) :: Rune -// -// 1....... :: 128 40-bit types -// -// 0....... 1....... :: 16384 32-bit types -// -// 0....... 0....... 1....... :: 2097152 24-bit types -// -// (etc.) -// -// 001 :: Short string -// -// 01. :: Small rational -// -// 1.. :: Undefined -// -// ==== Runes and Small Values ==== -// -// Runes are symbols of 1 to 6 ASCII characters used to implement reader syntax. -// They are NUL-terminated if shorter than six characters, meaning they cannot -// contain the NUL byte in their value. -// -// NOTE: The order in which the characters of the rune are encoded depends on -// endianness. On little-endian systems (i.e. most modern architectures) the -// characters will be in "reverse" order, with the first character in lowest -// position, so the terminating NUL has to be searched from low to high. -// -// Forbidden Value #4, Positive Infinity, would denote a rune of length zero -// (all NUL bytes) which isn't allowed, so we avoid stepping on it. -// -// The fact that runes are limited to ASCII opens up a lot of space for other -// small values to co-inhabit the same 48-bit range. We subdivide this space -// into increasingly many potential types, with smaller and smaller payloads, -// where the highest byte with a non-zero MSb determines which size category -// we're in: If the highest byte has its MSb set, then its seven non-MSb bits -// define a type and each type has a 40-bit value range; if the second highest -// byte has its MSb set, then the 14 non-MSb bits of the two high bytes define -// the type and each has a 32-bit value range; and so on. -// -// Unicode code points need 21 bits, so we use a 24-bit type for Characters. -// Miscellaneous values like true, false, nil, eof, etc. are placed into an -// 8-bit type, since there will never be that many of them. -// -// ==== Strings ==== -// -// Another 48-bit space is used for strings of zero to six bytes. Like runes, -// these are NUL-terminated if shorter than six bytes, meaning that NUL cannot -// appear in them, but otherwise they allow arbitrary bytes. -// -// ==== Small rationals ==== -// -// We use a 49-bit space for small exact rational numbers, with the numerator -// being a two's complement 25-bit signed integer, and denominator a 24-bit -// unsigned integer. -// +//! +//! === NaN Packing Strategy === +//! +//! Format of a double, in most to least significant field order: +//! +//! { sign: u1, exponent: u11, fraction: u52 } +//! +//! When the exponent bits are all set, it's either a NaN or an Infinity. +//! +//! For value packing, almost all remaining 53 bits are available, giving us +//! about 2^53 values, except for the four following bit patterns: +//! +//! *** FORBIDDEN VALUES *** +//! +//! 1. Negative cqNaN = { sign = 1, exponent = max, fraction = 2^51 } +//! +//! 2. Negative Infinity = { sign = 1, exponent = max, fraction = 0 } +//! +//! 3. Positive cqNaN = { sign = 0, exponent = max, fraction = 2^51 } +//! +//! 4. Positive Infinity = { sign = 0, exponent = max, fraction = 0 } +//! +//! The abbreviation "cqNaN" stands for canonical quiet NaN. +//! +//! Note that 2^51 means the MSb of the 52 fraction bits being set, and the rest +//! being zero. The fraction MSb is also called the is_quiet flag, because it +//! demarcates quiet NaNs. The rest being zero makes it the canonical qNaN. +//! +//! The positive and negative cqNaN are the *only* NaN values that can actually +//! be returned by FP operations, which is convenient because it means we can +//! simply use them to represent themselves in Zisp. +//! +//! Infinity values may also be returned by FP operations, and we want them to +//! exist in Zisp as doubles as well, so they also represent themselves. +//! +//! Beyond those four bit patterns, all values with a maximum exponent (all bits +//! set) are fair game for representing other values, so 2^53 - 4 possibilities. +//! +//! We split those 2^53 - 4 available values into four groups, each allowing for +//! 2^51 - 1 different values to be encoded. (51-bit values excluding zero.) +//! +//! sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1 +//! +//! sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2 +//! +//! sign = 0, quiet = 1 :: Pointers +//! +//! sign = 0, quiet = 0 :: Others +//! +//! +//! === Fixnums === +//! +//! Negative fixnums actually represent themselves without needing to go through +//! any transformation. Only the smallest 52-bit signed negative, -2^51, cannot +//! be represented, as it would step on Forbidden Value #1, Negative cqNaN. +//! +//! Positive fixnums go through bitsiwe NOT (implemented via an XOR mask here to +//! make it one operation together with the NaN masking) to avoid the all-zero +//! payload value, which would step on Forbidden Value #2, Negative Infinity. +//! +//! +//! === Pointers === +//! +//! Pointers are further subdivided as follows based on the remaining 51 bits, +//! with the first three bits used as a sort of tag: +//! +//! 000 :: Regular pointer to Zisp heap object (string, vector, etc.) +//! +//! 001 :: Weak pointer to Zisp heap object +//! +//! 01. :: Undefined +//! +//! 1.. :: Undefined +//! +//! This means pointers to the Zisp heap are 48 bits. Of those, we only really +//! need 45, since 64-bit platforms are in practice limited to 48-bit addresses, +//! and Zisp heap allocations happen at 8-byte boundaries, meaning the lowest 3 +//! bits are always unset. Thus, we are able to store yet another 3-bit tag in +//! those 48-bit pointers alongside the actual, multiple-of-8, 48-bit address. +//! +//! Forbidden Value #3, Positive cqNaN, is avoided thanks to the fact that a +//! regular Zisp heap pointer can never be null. Weak pointers, which can be +//! null, avoid stepping on that forbidden value thanks to bit 49 being set. +//! +//! +//! === Other values === +//! +//! This 51-bit range is divided as follows: +//! +//! 000 :: Subdivided as follows: +//! +//! 0....... 0....... 0....... (etc.) :: Rune +//! +//! 1....... :: 128 40-bit types +//! +//! 0....... 1....... :: 16384 32-bit types +//! +//! 0....... 0....... 1....... :: 2097152 24-bit types +//! +//! (etc.) +//! +//! 001 :: Short string +//! +//! 01. :: Small rational +//! +//! 1.. :: Undefined +//! +//! ==== Runes and Small Values ==== +//! +//! Runes are symbols up to 6 ASCII characters used to implement reader syntax. +//! They are NUL-terminated if shorter than six characters, meaning they cannot +//! contain the NUL byte in their value. +//! +//! NOTE: The order in which the characters of the rune are encoded depends on +//! endianness. On little-endian systems (i.e. most modern architectures) the +//! characters will be in "reverse" order, with the first character in lowest +//! position, so the terminating NUL has to be searched from low to high. +//! +//! Forbidden Value #4, Positive Infinity, would denote a rune of length zero +//! (all NUL bytes) which isn't allowed, so we avoid stepping on it. +//! +//! The fact that runes are limited to ASCII opens up a lot of space for other +//! small values to co-inhabit the same 48-bit range. We subdivide this space +//! into increasingly many potential types, with smaller and smaller payloads, +//! where the highest byte with a non-zero MSb determines which size category +//! we're in: If the highest byte has its MSb set, then its seven non-MSb bits +//! define a type and each type has a 40-bit value range; if the second highest +//! byte has its MSb set, then the 14 non-MSb bits of the two high bytes define +//! the type and each has a 32-bit value range; and so on. +//! +//! Unicode code points need 21 bits, so we use a 24-bit type for Characters. +//! Miscellaneous values like true, false, nil, eof, etc. are placed into an +//! 8-bit type, since there will never be that many of them. +//! +//! ==== Strings ==== +//! +//! Another 48-bit space is used for strings of zero to six bytes. Like runes, +//! these are NUL-terminated if shorter than six bytes, meaning that NUL cannot +//! appear in them, but otherwise they allow arbitrary bytes. +//! +//! ==== Small rationals ==== +//! +//! We use a 49-bit space for small exact rational numbers, with the numerator +//! being a two's complement 25-bit signed integer, and denominator a 24-bit +//! unsigned integer. +//! const builtin = @import("builtin"); const std = @import("std"); |
