summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--src/zisp/io/Parser.zig85
-rw-r--r--src/zisp/value.zig292
2 files changed, 179 insertions, 198 deletions
diff --git a/src/zisp/io/Parser.zig b/src/zisp/io/Parser.zig
index ed55a36..dbcd6ad 100644
--- a/src/zisp/io/Parser.zig
+++ b/src/zisp/io/Parser.zig
@@ -1,55 +1,36 @@
-// === Syntax ===
-//
-// See docs/parser.md and spec/syntax.bnf to understand the syntax.
-//
-//
-// === Trampolining strategy ===
-//
-// Instead of using recursion directly, the parser is written in a "trampoline"
-// style, which ensures that parse depth is not limited by stack size.
-//
-// All state and context is passed around via a struct pointer. The parser has
-// a main loop, which calls a function as dictated by parser.context.next, and
-// the function may update the state to have another function called next.
-//
-// If a function wants to call a recursive subroutine, it pushes some of the
-// current context onto a stack, including what function the subroutine should
-// return to, and then updates the state to instruct the main loop to call one
-// of the entry point subroutines.
-//
-// If a function wants to make the parser return, either from a subroutine or
-// from the main loop, it sets the .result field, and tries to pop the saved
-// context. If the context stack was empty, the main loop returns.
-//
-//
-// === Buffering ===
-//
-// For efficiency, call the parser on an input stream with implicit buffering.
-//
-// The parser does not use its own buffer, beyond one character that may be
-// written back into the unread_char field, which is checked at the end to
-// ensure it's nothing other than a trailing blank or comment.
-//
-// This lack of buffering is to ensure that the parser never reads more bytes
-// from the input than what it needs to parse a datum. Consider the input:
-//
-// (a b c) (x y z)
-//
-// If we used proper buffering, like reading up to 4K bytes per read, then the
-// whole stream would be consumed at once before it's parsed. Then, the parser
-// would return the first datum, and the rest of the stream would be lost. The
-// parser would need some way to reset the input stream's read head to the end
-// of the first datum, but not all stream types may support this.
-//
-// Although in a scenario like this it would be wise to create a single Parser
-// instance to call multiple times (in which case it could continue using its
-// internal buffer from where it left), this may not always be practical.
-//
-// One could even have an input stream with Zisp s-expressions and other data
-// intertwined, in which case this lack of buffering is also crucial, since one
-// needs to alternate between calling the Zisp parser and some other parser on
-// the same input stream.
-//
+//!
+//! === Syntax ===
+//!
+//! See docs/c1/1-parse.md to understand the implemented syntax.
+//!
+//!
+//! === Trampolining strategy ===
+//!
+//! Instead of using recursion directly, the parser is written in a "trampoline"
+//! style, which ensures that parse depth is not limited by stack size.
+//!
+//! All state and context is passed around via a struct pointer. The parser has
+//! a main loop, which calls a function as dictated by parser.context.next, and
+//! the function may update the state to have another function called next.
+//!
+//! If a function wants to call a recursive subroutine, it pushes some of the
+//! current context onto a stack, including what function the subroutine should
+//! return to, and then updates the state to instruct the main loop to call one
+//! of the entry point subroutines.
+//!
+//! If a function wants to make the parser return, either from a subroutine or
+//! from the main loop, it sets the .result field, and tries to pop the saved
+//! context. If the context stack was empty, the main loop returns.
+//!
+//!
+//! === Buffering ===
+//!
+//! For efficiency, call the parser on an input stream with implicit buffering.
+//!
+//! The parser does not use its own buffer, beyond one character that may be
+//! written back into the unread_char field, which is checked at the end to
+//! ensure it's nothing other than a trailing blank or comment.
+//!
const builtin = @import("builtin");
const std = @import("std");
diff --git a/src/zisp/value.zig b/src/zisp/value.zig
index bcabdf8..e8813eb 100644
--- a/src/zisp/value.zig
+++ b/src/zisp/value.zig
@@ -1,149 +1,149 @@
-//
-// === NaN Packing Strategy ===
-//
-// Format of a double, in most to least significant field order:
-//
-// { sign: u1, exponent: u11, fraction: u52 }
-//
-// When the exponent bits are all set, it's either a NaN or an Infinity.
-//
-// For value packing, almost all remaining 53 bits are available, giving us
-// about 2^53 values, except for the four following bit patterns:
-//
-// *** FORBIDDEN VALUES ***
-//
-// 1. Negative cqNaN = { sign = 1, exponent = max, fraction = 2^51 }
-//
-// 2. Negative Infinity = { sign = 1, exponent = max, fraction = 0 }
-//
-// 3. Positive cqNaN = { sign = 0, exponent = max, fraction = 2^51 }
-//
-// 4. Positive Infinity = { sign = 0, exponent = max, fraction = 0 }
-//
-// The abbreviation "cqNaN" stands for canonical quiet NaN.
-//
-// Note that 2^51 means the MSb of the 52 fraction bits being set, and the rest
-// being zero. The fraction MSb is also called the is_quiet flag, because it
-// demarcates quiet NaNs. The rest being zero makes it the canonical qNaN.
-//
-// The positive and negative cqNaN are the *only* NaN values that can actually
-// be returned by FP operations, which is convenient because it means we can
-// simply use them to represent themselves in Zisp.
-//
-// Infinity values may also be returned by FP operations, and we want them to
-// exist in Zisp as doubles as well, so they also represent themselves.
-//
-// Beyond those four bit patterns, all values with a maximum exponent (all bits
-// set) are fair game for representing other values, so 2^53 - 4 possibilities.
-//
-// We split those 2^53 - 4 available values into four groups, each allowing for
-// 2^51 - 1 different values to be encoded. (51-bit values excluding zero.)
-//
-// sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1
-//
-// sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2
-//
-// sign = 0, quiet = 1 :: Pointers
-//
-// sign = 0, quiet = 0 :: Others
-//
-//
-// === Fixnums ===
-//
-// Negative fixnums actually represent themselves without needing to go through
-// any transformation. Only the smallest 52-bit signed negative, -2^51, cannot
-// be represented, as it would step on Forbidden Value #1, Negative cqNaN.
-//
-// Positive fixnums go through bitsiwe NOT (implemented via an XOR mask here to
-// make it one operation together with the NaN masking) to avoid the all-zero
-// payload value, which would step on Forbidden Value #2, Negative Infinity.
-//
-//
-// === Pointers ===
-//
-// Pointers are further subdivided as follows based on the remaining 51 bits,
-// with the first three bits used as a sort of tag:
-//
-// 000 :: Regular pointer to Zisp heap object (string, vector, etc.)
-//
-// 001 :: Weak pointer to Zisp heap object
-//
-// 01. :: Undefined
-//
-// 1.. :: Undefined
-//
-// This means pointers to the Zisp heap are 48 bits. Of those, we only really
-// need 45, since 64-bit platforms are in practice limited to 48-bit addresses,
-// and Zisp heap allocations happen at 8-byte boundaries, meaning the lowest 3
-// bits are always unset. Thus, we are able to store yet another 3-bit tag in
-// those 48-bit pointers alongside the actual, multiple-of-8, 48-bit address.
-//
-// Forbidden Value #3, Positive cqNaN, is avoided thanks to the fact that a
-// regular Zisp heap pointer can never be null. Weak pointers, which can be
-// null, avoid stepping on that forbidden value thanks to bit 49 being set.
-//
-//
-// === Other values ===
-//
-// This 51-bit range is divided as follows:
-//
-// 000 :: Subdivided as follows:
-//
-// 0....... 0....... 0....... (etc.) :: Rune
-//
-// 1....... :: 128 40-bit types
-//
-// 0....... 1....... :: 16384 32-bit types
-//
-// 0....... 0....... 1....... :: 2097152 24-bit types
-//
-// (etc.)
-//
-// 001 :: Short string
-//
-// 01. :: Small rational
-//
-// 1.. :: Undefined
-//
-// ==== Runes and Small Values ====
-//
-// Runes are symbols of 1 to 6 ASCII characters used to implement reader syntax.
-// They are NUL-terminated if shorter than six characters, meaning they cannot
-// contain the NUL byte in their value.
-//
-// NOTE: The order in which the characters of the rune are encoded depends on
-// endianness. On little-endian systems (i.e. most modern architectures) the
-// characters will be in "reverse" order, with the first character in lowest
-// position, so the terminating NUL has to be searched from low to high.
-//
-// Forbidden Value #4, Positive Infinity, would denote a rune of length zero
-// (all NUL bytes) which isn't allowed, so we avoid stepping on it.
-//
-// The fact that runes are limited to ASCII opens up a lot of space for other
-// small values to co-inhabit the same 48-bit range. We subdivide this space
-// into increasingly many potential types, with smaller and smaller payloads,
-// where the highest byte with a non-zero MSb determines which size category
-// we're in: If the highest byte has its MSb set, then its seven non-MSb bits
-// define a type and each type has a 40-bit value range; if the second highest
-// byte has its MSb set, then the 14 non-MSb bits of the two high bytes define
-// the type and each has a 32-bit value range; and so on.
-//
-// Unicode code points need 21 bits, so we use a 24-bit type for Characters.
-// Miscellaneous values like true, false, nil, eof, etc. are placed into an
-// 8-bit type, since there will never be that many of them.
-//
-// ==== Strings ====
-//
-// Another 48-bit space is used for strings of zero to six bytes. Like runes,
-// these are NUL-terminated if shorter than six bytes, meaning that NUL cannot
-// appear in them, but otherwise they allow arbitrary bytes.
-//
-// ==== Small rationals ====
-//
-// We use a 49-bit space for small exact rational numbers, with the numerator
-// being a two's complement 25-bit signed integer, and denominator a 24-bit
-// unsigned integer.
-//
+//!
+//! === NaN Packing Strategy ===
+//!
+//! Format of a double, in most to least significant field order:
+//!
+//! { sign: u1, exponent: u11, fraction: u52 }
+//!
+//! When the exponent bits are all set, it's either a NaN or an Infinity.
+//!
+//! For value packing, almost all remaining 53 bits are available, giving us
+//! about 2^53 values, except for the four following bit patterns:
+//!
+//! *** FORBIDDEN VALUES ***
+//!
+//! 1. Negative cqNaN = { sign = 1, exponent = max, fraction = 2^51 }
+//!
+//! 2. Negative Infinity = { sign = 1, exponent = max, fraction = 0 }
+//!
+//! 3. Positive cqNaN = { sign = 0, exponent = max, fraction = 2^51 }
+//!
+//! 4. Positive Infinity = { sign = 0, exponent = max, fraction = 0 }
+//!
+//! The abbreviation "cqNaN" stands for canonical quiet NaN.
+//!
+//! Note that 2^51 means the MSb of the 52 fraction bits being set, and the rest
+//! being zero. The fraction MSb is also called the is_quiet flag, because it
+//! demarcates quiet NaNs. The rest being zero makes it the canonical qNaN.
+//!
+//! The positive and negative cqNaN are the *only* NaN values that can actually
+//! be returned by FP operations, which is convenient because it means we can
+//! simply use them to represent themselves in Zisp.
+//!
+//! Infinity values may also be returned by FP operations, and we want them to
+//! exist in Zisp as doubles as well, so they also represent themselves.
+//!
+//! Beyond those four bit patterns, all values with a maximum exponent (all bits
+//! set) are fair game for representing other values, so 2^53 - 4 possibilities.
+//!
+//! We split those 2^53 - 4 available values into four groups, each allowing for
+//! 2^51 - 1 different values to be encoded. (51-bit values excluding zero.)
+//!
+//! sign = 1, quiet = 1 :: Negative Fixnum from -1 to -2^51+1
+//!
+//! sign = 1, quiet = 0 :: Positive Fixnum from 0 to 2^51-2
+//!
+//! sign = 0, quiet = 1 :: Pointers
+//!
+//! sign = 0, quiet = 0 :: Others
+//!
+//!
+//! === Fixnums ===
+//!
+//! Negative fixnums actually represent themselves without needing to go through
+//! any transformation. Only the smallest 52-bit signed negative, -2^51, cannot
+//! be represented, as it would step on Forbidden Value #1, Negative cqNaN.
+//!
+//! Positive fixnums go through bitsiwe NOT (implemented via an XOR mask here to
+//! make it one operation together with the NaN masking) to avoid the all-zero
+//! payload value, which would step on Forbidden Value #2, Negative Infinity.
+//!
+//!
+//! === Pointers ===
+//!
+//! Pointers are further subdivided as follows based on the remaining 51 bits,
+//! with the first three bits used as a sort of tag:
+//!
+//! 000 :: Regular pointer to Zisp heap object (string, vector, etc.)
+//!
+//! 001 :: Weak pointer to Zisp heap object
+//!
+//! 01. :: Undefined
+//!
+//! 1.. :: Undefined
+//!
+//! This means pointers to the Zisp heap are 48 bits. Of those, we only really
+//! need 45, since 64-bit platforms are in practice limited to 48-bit addresses,
+//! and Zisp heap allocations happen at 8-byte boundaries, meaning the lowest 3
+//! bits are always unset. Thus, we are able to store yet another 3-bit tag in
+//! those 48-bit pointers alongside the actual, multiple-of-8, 48-bit address.
+//!
+//! Forbidden Value #3, Positive cqNaN, is avoided thanks to the fact that a
+//! regular Zisp heap pointer can never be null. Weak pointers, which can be
+//! null, avoid stepping on that forbidden value thanks to bit 49 being set.
+//!
+//!
+//! === Other values ===
+//!
+//! This 51-bit range is divided as follows:
+//!
+//! 000 :: Subdivided as follows:
+//!
+//! 0....... 0....... 0....... (etc.) :: Rune
+//!
+//! 1....... :: 128 40-bit types
+//!
+//! 0....... 1....... :: 16384 32-bit types
+//!
+//! 0....... 0....... 1....... :: 2097152 24-bit types
+//!
+//! (etc.)
+//!
+//! 001 :: Short string
+//!
+//! 01. :: Small rational
+//!
+//! 1.. :: Undefined
+//!
+//! ==== Runes and Small Values ====
+//!
+//! Runes are symbols up to 6 ASCII characters used to implement reader syntax.
+//! They are NUL-terminated if shorter than six characters, meaning they cannot
+//! contain the NUL byte in their value.
+//!
+//! NOTE: The order in which the characters of the rune are encoded depends on
+//! endianness. On little-endian systems (i.e. most modern architectures) the
+//! characters will be in "reverse" order, with the first character in lowest
+//! position, so the terminating NUL has to be searched from low to high.
+//!
+//! Forbidden Value #4, Positive Infinity, would denote a rune of length zero
+//! (all NUL bytes) which isn't allowed, so we avoid stepping on it.
+//!
+//! The fact that runes are limited to ASCII opens up a lot of space for other
+//! small values to co-inhabit the same 48-bit range. We subdivide this space
+//! into increasingly many potential types, with smaller and smaller payloads,
+//! where the highest byte with a non-zero MSb determines which size category
+//! we're in: If the highest byte has its MSb set, then its seven non-MSb bits
+//! define a type and each type has a 40-bit value range; if the second highest
+//! byte has its MSb set, then the 14 non-MSb bits of the two high bytes define
+//! the type and each has a 32-bit value range; and so on.
+//!
+//! Unicode code points need 21 bits, so we use a 24-bit type for Characters.
+//! Miscellaneous values like true, false, nil, eof, etc. are placed into an
+//! 8-bit type, since there will never be that many of them.
+//!
+//! ==== Strings ====
+//!
+//! Another 48-bit space is used for strings of zero to six bytes. Like runes,
+//! these are NUL-terminated if shorter than six bytes, meaning that NUL cannot
+//! appear in them, but otherwise they allow arbitrary bytes.
+//!
+//! ==== Small rationals ====
+//!
+//! We use a 49-bit space for small exact rational numbers, with the numerator
+//! being a two's complement 25-bit signed integer, and denominator a 24-bit
+//! unsigned integer.
+//!
const builtin = @import("builtin");
const std = @import("std");