diff options
| author | Taylan Kammer <taylan.kammer@gmail.com> | 2025-02-28 14:38:57 +0100 |
|---|---|---|
| committer | Taylan Kammer <taylan.kammer@gmail.com> | 2025-02-28 14:38:57 +0100 |
| commit | 472f3e89a61ec51218cefe65305ec6f0a0d95fbf (patch) | |
| tree | a64ef16a6b23a822ab09e02b9d967f3b8bb3d17e /html | |
| parent | 34de389fe744018e808f2c8b301648d504ab610d (diff) | |
update
Diffstat (limited to 'html')
| -rw-r--r-- | html/index.md | 19 | ||||
| -rw-r--r-- | html/notes/macros.md | 151 | ||||
| -rw-r--r-- | html/notes/sr.md | 368 |
3 files changed, 532 insertions, 6 deletions
diff --git a/html/index.md b/html/index.md index 37565f1..e7c5ff2 100644 --- a/html/index.md +++ b/html/index.md @@ -6,9 +6,17 @@ been invented today, and had it been designed with pragmatic use as a primary concern in its design. This language doesn't actually exist yet. You are merely reading the -ramblings of a madman. +ramblings of a madman. A little bit of code is here already though: -* [Compilation is execution](notes/compilation.html) +[Zisp on GitHub](https://github.com/TaylanUB/zisp/) + +Some of the following articles are quite insightful. Others are VERY +rambly; you've been warned. + +Some are outdated with regards to the actual implementation of Zisp, +because writing the code often gives you yet another perspective. + +* [Compilation is execution](notes/compile.html) * [Everything can be serialized](notes/serialize.html) * [Symbols are strings](notes/symbols.html) * [Stop the "cons" madness!](notes/cons.html) @@ -22,7 +30,6 @@ ramblings of a madman. * [Object-oriented programming](notes/oop.html) * [Equality and equivalence semantics](notes/equal.html) * [NaN-packing](notes/nan.html) - -Temporary source repo before I set up my own git server: - -[Zisp on GitHub](https://github.com/TaylanUB/zisp/) +* [Reader? Decoder? I barely know 'er!](notes/reader.html) +* [Does the decoder implement macros?](notes/macros.html) +* [Better syntax-rules?](notes/sr.html) diff --git a/html/notes/macros.md b/html/notes/macros.md new file mode 100644 index 0000000..3169c49 --- /dev/null +++ b/html/notes/macros.md @@ -0,0 +1,151 @@ +# Does the decoder implement macros? + +I've written about the [parser/decoder dualism](reader.html) in a +previous article. Long story short, the parser takes care of syntax +sugar, like turning `#(...)` into `(#HASH ...)`, and the decoder takes +care of turning that into a vector or whatever. + +Now, since the job of the decoder seems superficially quite similar to +that of a macro expander, I've been agonizing for the past two days or +so whether it *is* the macro expander. + +(Warning: This post is probably going to be very rambly, as I'm trying +to gather my thoughts by writing it.) + +On one hand, sure: + + (define-syntax #HASH + (syntax-rules () + (#HASH <element> ...) + (vector '<element> ...))) + +Or something like that. You know what I mean? I mean, in Scheme you +can't return a vector from a macro, but in Zisp the idea is that you +can very well do that if you want, because why not. + +It's very much possible that I will eventually realize that this is a +bad idea in some way, but we'll see. So far I really like the idea of +a macro just returning objects, like a procedure, rather than having +to return a syntax object that has a binding to that procedure. + +This may be similar to John Shutt's "vau calculus" from his language +Kernel. Maybe Zisp will even end up being an implementation of the +vau calculus. But I don't know; I've never fully grokked the vau +calculus, so if I end up implementing it, it will be by accident. + +In any case, I want the user to be able to bind transformers to runes, +and doing so feels like it's pretty much the same thing as defining a +macro, so maybe the decoder should also be the macro expander. + +But then there's an issue with quoting. Consider the following: + + (define stuff '(foo #(0 1 2))) + +In Zisp, this would first of all be parsed into: + + (define stuff (#QUOTE foo (#HASH 0 1 2))) + +Now, if #QUOTE didn't decode its operand, we'd end up seeing #HASH in +the result, never creating the vector we meant to create. + +But if #QUOTE calls decode on its operand, and the decoder is also the +macro expander, whoops: + + (let-syntax ((foo (syntax-rules () ((_ x) (bar x))))) + '(foo #(0 1 2))) + + ;; => (bar #(0 1 2)) + +I mean... MAYBE that should happen, actually?! Probably not, though. +What Scheme does isn't gospel; Zisp isn't Scheme and it will do some +things differently, but we *probably* don't want anything inside a +quoted expression to be macro expanded. Probably. + +The thought that I might actually want that to happen sent me down a +whole rabbit whole, and made me question "runes" altogether. If they +just make the decoder invoke a predefined macro, well, why not ditch +runes and have the parser emit macro calls? + +So instead of: + + #(x y z) -> (#HASH x y z) + +(Which is then "decoded" into a vector...) Why not just: + + #(x y z) -> (VECTOR x y z) + +And then `VECTOR` is, I don't know, a macro in the standard library I +guess. If the decoder is the macro expander, then sure, it will know +about the standard library; it will have a full-blown environment that +it uses to macro expand, to look up macro names. + +But no, I think this conflates everything too much. Even just on the +level of comprehensibility of code containing literals, I think it's +good for there to be something that you just know will turn into an +object of some type, no matter what; that's what a literal is. + +(In Zisp, it's not the reader that immediately turns the literal into +an object of the correct type, but the decoder still runs before the +evaluator so it's almost the same.) + +Then again, maybe this intuition just comes from having worked with +Scheme for such a long time, and maybe it's not good. Perhaps it's +more elegant if everything is a macro. Don't pile feature on top of +feature, remember? + +Booleans, by the way, would just be identifier syntax then. Just +`true` and `false` without the hash sign. In Zisp, you can't shadow +identifiers anyway, so now they're like keywords in other languages, +also a bit like `t` and `nil` in CL and Elisp. + +IF we are fine with the quote issue described above, then I *think* +everything being a macro would be the right thing to do. Although +I've said the decoder could be used for things other than code, like +for configuration files containing user-defined data types, you could +still do that by defining macros and calling the macro expander on the +config file. + +It's just that you would either not be able to have stuff like vectors +in a quoted list (you'd just get a list like `(VECTOR ...)` in it if +you tried), or you'd have to be expanding any macros encountered +within the quoted list. Either both, or neither. + +Not getting a choice, you say... That's not very expressive. That +seems like a limitation in the language. Remember: remove the +limitations that make additional features seem necessary. + +Next thing we will have two variants of quote: One which quotes for +real, and one that expands macros. Or maybe some mechanism to mark +macros as being meant to be run inside a quote or not, but then we +re-invented runes in a different way. + +Which brings me back to runes, and how `#QUOTE` could handle them, +even if the decoder is the macro expander. + +Encountering `#QUOTE` could tell the decoder that while decoding the +operand, it should only honor runes, not macros bound to identifiers. + +That would probably be a fine way to solve the quote problem, should +the decoder also be the macro expander: Macros are bound to runes or +identifiers, and the rune-bound macros are those that are expanded +even inside a quote. + +I think that would be the same as having completely separate decode +and macro-expand phases. + +(The reason we would want them merged, by the way, is that it would +presumably prevent duplication of code, since what they do is so +similar.) + +It's possible that I'm agonizing for no reason at all because maybe +the decoder cannot be the macro expander anyway. + +We will see. + +For now, I think it's best to proceed by implementing the decoder, and +once I've come to the macro expander I can see if it makes sense to +merge the two or not. + +But I'll probably keep runes one way or another, since they're a nice +way of marking things that should be processed "no matter what" such +that they can function as object literals within code. diff --git a/html/notes/sr.md b/html/notes/sr.md new file mode 100644 index 0000000..0fa9e06 --- /dev/null +++ b/html/notes/sr.md @@ -0,0 +1,368 @@ +# Better syntax-rules? + +Yesterday, someone on IRC asked for help in improving the following +syntax-rules (s-r) macro: + +```scheme + +(define-syntax alist-let* + (syntax-rules () + + ;; uses subpattern to avoid fender + ;; alist-expr is evaluated only once + ((_ alist-expr ((key alias) ...) body body* ...) + (let ((alist alist-expr)) + (let ((alias (assq-ref alist 'key)) ...) + body body* ...))) + + ((_ alist-expr (key ...) body body* ...) + (let ((alist alist-expr)) + (let ((key (assq-ref alist 'key)) ...) + body body* ...))) + +)) + +;; Example uses: + +(define alist '((foo . 1) (bar . 2))) + +(alist-let alist (foo bar) + (+ foo bar)) ;=> 3 + +(alist-let alist ((foo x) (bar y)) + (+ x y)) ;=> 3 + +;; Problem: Can't mix plain key with (key alias) forms: + +(alist-let alist ((foo x) bar) + (+ x bar)) ;ERROR + +``` + +How do we make it accept a mix of plain keys and `(key alias)` pairs? +Oh boy, it's more difficult than you may think if you're new to s-r +macros. Basically, there's no "obvious" solution, and all we have is +various hacks we can apply. + +Let's look at two fairly straightforward hacks, and their problems. + +## Option 1 + +```scheme + +;; Solution 1: Internal helper patterns using a dummy constant. + +(define-syntax alist-let* + (syntax-rules () + + ((_ "1" alist ((key alias) rest ...) body body* ...) + (let ((alias (assq-ref alist 'key))) + (alist-let* "1" alist (rest ...) body body* ...))) + + ((_ "1" alist (key rest ...) body body* ...) + (let ((key (assq-ref alist 'key))) + (alist-let* "1" alist (rest ...) body body* ...))) + + ((_ "1" alist () body body* ...) + (begin body body* ...)) + + ;; dispatch, ensuring alist-expr only eval'd once + ((_ <alist> <bindings> <body> <body*> ...) + (let ((alist <alist>)) + (alist-let* "1" alist <bindings> <body> <body*> ...))) + +)) + +``` + +(I've switched to my `<foo>` notation for pattern variables in the +"dispatcher" part. Don't let it distract you. I strongly endorse +that convention for s-r pattern variables, to make it clear that +they're like "empty slots" where *any* expression can match, but +that's a topic for another day.) + +What the solution above does, is "dispatch" actual uses of the macro, +which obviously won't have the string literal `"1"` in first position, +onto internal sub-macros, which can call each other recursively, so +each layer only handles either a stand-alone `key` or a `(key alias)` +couple. + +There's some nuances to this implementation. First, if you're not +familiar with s-r macros, you may mistakenly worry that this solution +could mask a programmer error: What if we accidentally call the macro +with a variable bound to the string "1"? Would this lead to a very +annoying bug that's hard to find? No; remember that syntax-rules +patterns match *unevaluated* operands, so the internal sub-patterns +are only triggered by the appearance of a literal string constant of +`"1"` in the first position; a mistake that would be very apparent in +code you're reading, and is extremely unlikely to occur by accident. + +As for a real pitfall of this implementation: The dispatcher pattern +*must* be in the final position; otherwise it will actually catch our +recursive calls starting with `"1"` and bind that string literal to +the `alist` pattern variable! (Kind of the "reverse" of the fake +problem described in the previous paragraph, in a sense?) If the +dispatcher pattern is in the first position, it will keep calling +itself with an increasing number of `"1"`s at the start, in an +infinite loop, until you forcibly stop it or it crashes. + +As a side note, this brings me to a general s-r pitfall, that applies +to the original implementation as well in this case: Since patterns +are matched top to bottom, a simple `key` pattern variable *could* +actually match the form `(key alias)`, so you have to make sure that +the pattern for matching those key-alias couples comes before the one +matching plain keys. + +Oh, and by the way, if you're questioning whether we even need those +internal helper patterns at all: Yes, it's the only way to ensure the +initial `<alist>` expression is only evaluated once, in an outermost +`let` wrapping everything. + +Let's summarize the issues we've faced: + +1. It's easy to forget that pattern variables can match arbitrary + expressions, not just identifiers, and there's no way to say it + should only match identifiers. + +2. When an arbitrary expression is matched by the pattern variable, + using it means repeating that expression every time, unless you + explicitly use `let` to take care of that, which may require + dispatching to another pattern immediately if you wanted to use + recursive patterns. + +3. You may accidentally put a more generic pattern first, causing it + to match an input that was meant to be matched by a subsequent + pattern with more deeper destructuring. + +It may be interesting trying to solve 3 by specifying some way of +measuring the "specificity" of a pattern, and saying that those with +the highest specificity match first, but that may prove difficult. +Besides, solving 1 would basically solve 3 anyway. + +Racket has syntax-parse, which solves the first problem through an +incredibly sophisticated specification of "syntax patterns" that take +the place of the humble generic pattern variable of syntax-rules. +It's cool and all, but the charm of s-r is the simplicity. Can't we +use some of the ideas of syntax-parse patterns and add them to s-r? + +In Racket, there's the concept of "syntax classes," and a pattern can +be a variable with `:syntax-class-id` appended to its name, which is +how you make it only match inputs of that syntax class, such as for +example, only identifiers. Trying to find out what syntax class ids +are supported may send you down a rabbit hole of how you can actually +define your own syntax classes, but that just seems to be a weak spot +of the Racket online documentation; looking a bit closer, you should +find the list of built-in classes that are supported. They are just +called "library" syntax classes for some reason: + +[Library Syntax Classes and Literal Sets -- Racket Documentation](https://docs.racket-lang.org/syntax/Library_Syntax_Classes_and_Literal_Sets.html) + +It would be great if there were classes for atoms (anything that's not +a list) and lists, though; then we could do this: + +```scheme + +(define-syntax alist-let* + (syntax-rules () + + ((_ <alist>:list bindings body body* ...) + (let ((alist <alist>)) + (alist-let* alist bindings body body* ...))) + + ((_ alist (key:id ...) body body* ...) + (let ((key (assq-ref alist 'key)) ...) + body body* ...)) + + ((_ alist ((key:atom alias:id) ...) body body* ...) + (let ((alias (assq-ref alist 'key)) ...) + body body* ...)) + +)) + +``` + +(The key could also be a non-symbol immediate value, like a fixnum, +boolean, etc.; anything that `assq-ref` can compare via `eq?`. One +could also just not quote the key, and instead let it be an arbitrary +expression, which would probably make for a more useful macro, but +that's a different topic.) + +Isn't that really neat? But let's go one step further. I believe +this strategy of binding an expression via `let` to ensure it's only +evaluated once is probably so common that it warrants a shortcut: + +```scheme + +(define-syntax alist-let* + (syntax-rules () + + ((_ alist:bind (key:id ...) body body* ...) + (let ((key (assq-ref alist 'key)) ...) + body body* ...)) + + ((_ alist:bind ((key:atom alias:id) ...) body body* ...) + (let ((alias (assq-ref alist 'key)) ...) + body body* ...)) + +)) + +``` + +The idea here is: All pattern variables marked with `:bind` are first +collected, and if there is at least one that is not an identifier, +then the whole template (the part that produces the output of the s-r +macro) is wrapped in a `let` which binds those expressions to the name +of the pattern variable, and uses of that pattern variable within the +template refer to that binding. + +I'm not entirely sure yet if this is an ingenious idea, or a hacky fix +for just one arbitrary issue you can face while using syntax-rules, +but I suspect it's a common enough pattern to make it desirable. + +## Option 2 + +I said there were various hacks to solve the original problem; here's +the second variant. It's actually almost the same thing, but we put +the helper patterns into a separate macro. + +```scheme + +;; Solution 2: Separate helper macro + +(define-syntax alist-let* + (syntax-rules () + + ;; dispatch, ensuring alist-expr only eval'd once + ((_ <alist> <bindings> <body> <body*> ...) + (let ((alist <alist>)) + (%alist-let-helper alist <bindings> <body> <body*> ...))) + +)) + +(define-syntax %alist-let-helper + (syntax-rules () + + ;; basically do here what the internal helpers did in solution 1, + ;; but without the need for the "1" string literal hack + +)) + +``` + +That's cleaner in terms of the patterns we have to write, but we had +to define a second top-level macro, which feels wrong. It should be +properly encapsulated as part of the first. + +This is where another improvement to s-r could come in handy, and +that's not making it evaluate to a syntax transformer (i.e., lambda) +directly, but rather making it more like syntax-case in that regard. +However, the additional lambda wrapping always really annoyed me, so +the following syntax may be desirable. + +```scheme + +(define-syntax (alist-let* . s) + + (define-syntax (helper . s) + (syntax-rules s () + ((alist ((key alias) rest ...) body body* ...) + (let ((alias (assq-ref alist 'key))) + (alist-let* "1" alist (rest ...) body body* ...))) + + ((alist (key rest ...) body body* ...) + (let ((key (assq-ref alist 'key))) + (alist-let* "1" alist (rest ...) body body* ...))) + + ((alist () body body* ...) + (begin body body* ...)) + )) + + (syntax-rules s () + ((<alist> <bindings> <body> <body*> ...) + (let ((alist <alist>)) + (helper alist <bindings> <body> <body*> ...))))) + +``` + +That looks a bit confusing at first sight, but we can actually do +something a lot better now, since we already get one stand-alone +pattern at the start, which fits our intention perfectly here: + +```scheme + +(define-syntax (alist-let* <alist> <bindings> <body> <body*> ...) + + (define-syntax (helper . s) + (syntax-rules s () + ((alist ((key alias) rest ...) body body* ...) + (let ((alias (assq-ref alist 'key))) + (alist-let* "1" alist (rest ...) body body* ...))) + + ((alist (key rest ...) body body* ...) + (let ((key (assq-ref alist 'key))) + (alist-let* "1" alist (rest ...) body body* ...))) + + ((alist () body body* ...) + (begin body body* ...)) + )) + + #'(let ((alist <alist>)) + (helper alist <bindings> <body> <body*> ...))) + +``` + +To be honest, I don't like this solution nearly as much as the first, +and I now realize that there wouldn't be much point in keeping s-r if +it's going to be so close to syntax-case. (The only difference, at +this point, would be that s-r implicitly puts `#'` in front of the +templates. That's literally all it would do, if I'm not mistaken.) + +## Or just implement syntax-parse? + +Racket can actually give you the implicit lambda when you want it, by +offering `syntax-parser` as an alternative to `syntax-parse`: + +```scheme + +;; The following two are equivalent. + +(define-syntax foo + (lambda (s) + (syntax-parse s ...))) + +(define-syntax foo + (syntax-parser ...)) + +``` + +(At least, I'm pretty sure that's how it's supposed to work; the docs +just bind the result of `syntax-parser` to an identifier via `define` +and call it as a procedure to showcase it, for whatever reason.) + +Yes, syntax-parse is a lot more complex than syntax-rules, but to be +honest it seems mainly the fault of the documentation that it doesn't +showcase the simplest ways of using it, which look essentially the +same as using syntax-rules, so it's not clear why s-r should stay if +you have syntax-parse. + +Maybe I would just make one change, which is to allow the following +syntax and thus make the additional `syntax-parser` unnecessary: + +```scheme + +(define-syntax (foo s) + (syntax-parse s ...)) + +``` + +Note that this is different from my previous idea of making the first +operand to `define-syntax` a pattern. The only thing I don't like +about this variant is that there will never be more than one argument, +but maybe that's fine? + +In any case, I guess the only innovation I came up with here is the +special `:bind` syntax class id, assuming there isn't already a +similar thing in Racket or elsewhere. + +Oh and this made me realize I should add `foo:bar` as reader syntax to +Zisp, turning it into `(#COLON foo . bar)` or such. |
