From 451aa92846b5fd5c8a0739336de3aa26d741d750 Mon Sep 17 00:00:00 2001 From: Taylan Kammer Date: Sat, 29 Mar 2025 11:10:24 +0100 Subject: Relocate MD sources for HTML notes. --- notes/equal.md | 255 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 255 insertions(+) create mode 100644 notes/equal.md (limited to 'notes/equal.md') diff --git a/notes/equal.md b/notes/equal.md new file mode 100644 index 0000000..8c55faa --- /dev/null +++ b/notes/equal.md @@ -0,0 +1,255 @@ +# A novel approach to object equivalence + +## Story time + +In my past 5 years of developing a warehouse management application, +I've frequently found myself implementing equality the following way. + +(Code massively reduced to get to the point. Implementations of the +`hashCode` method, which must be compatible with `equals`, have been +left out for brevity.) + +```java + +class Warehouse { + String code; + String name; + // ... more fields, many mutable + + public boolean equals(Object obj) { + if (obj == this) { + return true; + } + if (obj instanceof Warehouse w) { + return code.equals(w.code); + } + return false; + } +} + +class Product { + int id; + LocalizedString name; + LocalizedString description; + // ... more fields, many mutable + + public boolean equals(Object obj) { + if (obj == this) { + return true; + } + if (obj instanceof Product p) { + return id == p.id; + } + return false; + } +} + +``` + +And so on. A type may have a code that is a String, an id that is an +int, or some other field that uniquely identifies members of it within +our application domain. + +I'm speaking of types and their members, not classes and instances, +because I mean the application domain. The user of our program has a +number of warehouses and a number of products, in reality, which we +represent via these classes, but there may be multiple instances of a +class representing the same real entity. + +Oftentimes, there will be a database that contains unique entries for +these. For example, an SQL table for products where the id is the +primary key, and an SQL table for warehouses where the code is the +primary key. + +This way of implementing `equals()` and `hashCode()` may seem wrong to +some developers. Since the classes have mutable fields, we may end up +with two `Product` instances which are not equal in their state, yet +`equals()` returns true for them since they represent the same real +life entity, just with different state in our program. Perhaps one +has just been fetched from the database, whereas the other has been +modified to represent changes that are yet to be committed. + +I've never found this strategy to lead to any problems. Never have I +had a need to know whether two `Product` instances have the same state +right now. What I care about is which product is being represented. +This is useful in various ways. + +### Map keys, sets, and so on + +First of all, it allows us to create maps where the keys are instances +of Product or Warehouse. For example, the application may want to +create a map of which products each warehouse contains: + +```java + +Map> productsByWarehouse; + +``` + +Given the `equals()` implementation of the Warehouse class, this map +can now be given any instance of Warehouse as a key, and we need not +ensure that we have one canonical instance used as the key. + +There are of course many alternatives to this. One may not want this +map to keep Warehouse instances alive, in which case it would be more +sensible to use the codes (Strings) for the keys. One could also add +a field to Warehouse such as `List products;`. However, in +some circumstances, a map such as the above may be most natural. + +Other data structures and operations depend on equality checks as +well. For example: finding duplicates in a collection, checking +whether a collection already contains a given item, managing a set +data type, and so on. The way we implement equality is also useful +for such purposes. + +### Elegance and encapsulation + +Secondly, we may have subsystems in the application that communicate +by passing Product or Warehouse instances to each other. For example, +the user may be looking at a user interface displaying the products in +a warehouse with their current quantities. The user may then click on +an "update quantities" button which opens a sub-interface where it's +possible to add an arbitrary number of entries (products) with the +desired quantity change for each. This sub-interface may then return +a list of Product instances with the associated quantity change. The +code of the warehouse overview interface can now use `equals()` to +determine which instance of Product it received corresponds to which +Product instance it's already holding on to. + +Again, there are of course many alternative strategies which don't +require such a use of `equals()`. One may explicitly compare the id +fields, or the sub-interface may return ids associated with quantity +changes. However, the `equals()` based implementation may offer the +cleanest possible code: + +```java + +public void receiveQuantityChanges(Map changes) { + for (Row row : this.productRows) { + Integer change = changes.get(row.product); + row.quantity += change; + } +} + +``` + +As far as readability and elegance is concerned, this can hardly be +improved on, as far as one is familiar with Java. Although using a +`Map` would hardly make the code any more verbose, +it immediately causes it to lose out on self-documentation: + +```java + +// Quantity changes of what? Should we rename this to the awfully +// long name receiveProductQuantityChanges to make it clear? +public void receiveQuantityChanges(Map changes) { + for (Row row : this.productRows) { + Integer change = changes.get(row.product.id); + row.quantity += change; + } +} + +``` + +This is a minor change as far as readability is concerned, but it also +decreases encapsulation. The code needs to be aware that products are +represented uniquely by an id that is an integer, and changing this +implementation detail of the class may have far-reaching consequences +throughout the code base. + +The self-documenting property may not apply to a Scheme-based language +without static types, but it makes the encapsulation aspect all the +more significant, since we won't have a compiler or static analyzer +which immediately tells us about a change in the type of `id`. We +must hope that the field `id` has been removed and replaced with a +different one, not reusing the `id` identifier. (For example, this +would mean a procedure like `product-id` stops existing, so a static +analyzer or compiler can immediately warn us about its absence. Had +we reused the field but changed its type, we would have a very hard +time finding all the places in the code we now need to fix.) + +## Get to the point! + +I've explained all this to arrive at one simple conclusion: + +Perhaps it would be a good idea for a language to have a mechanism in +which compound types like records or classes can simply declare that +one of the defined fields is the "unique identifier" of the type. + +Imitating SRFI 9: + +```scheme + +(define-record-type + (make-product id name description ...) + product? + (identity: id) ; <- see here + (id product-id) + (name product-name) + (description product-description) + ...) + +(define-record-type + (make-warehouse code name ...) + warehouse? + (identity: code) ; <- see here + (code warehouse-code) + (name warehouse-name) + ...) + +``` + +Now the million dollar question is whether it should be `equal?` that +makes use of this information, or `eqv?`, or both. + +Although `eqv?` is intended to approximate operational equivalence, +and should therefore not consider separate mutable objects to be the +same, it's not clear to me how useful this is on user-defined types. + +The rationale for defining `eqv?` in terms of operational equivalence +is that this allows implementing memoization. If a memoized function +depends on one of the mutable fields of a record, yet `eqv?` returns +true on separate instances with different state, then the function +will return an incorrect result. But I'm skeptical as to whether a +function like that would see much practical use. It seems to me like +a memoized function that is used to compute some property of an entity +in our application domain should probably depend only on immutable +properties of that object. + +Another way to look at `eqv?` is that it implements "mostly constant +time" equality. (Strictly speaking, it's a linear time operation on +numbers, assuming the implementation offers arbitrary-size numbers. +This rarely matters, as few programs make use of numbers larger than +what fits in a few dozen bytes.) Conversely, `equal?` is responsible +for deep equality testing, which is inherently linear time, though +record types were overlooked in its definition in R7RS-small. + +## "Why not neither?" + +(I'm coining this as a new meme, in contrast to "why not both?") + +I think I want Zisp to support the following: + +* `eq?`: As in Scheme, but maybe allowed to fail on procedures. +* `eqv?`: As in Scheme. +* `equiv?`: An even closer approximation of operational equivalence, + by diving into immutable compound data types that `eqv?` doesn't + handle due to trying to be constant-time. I believe this is the + same as `equal-always?` in Racket. +* `equal?`: As in Scheme, but also dives into records because why not? + +The reason `eq?` is allowed to fail on procedures is an optimization +strategy that duplicates/inlines procedures. (citation needed) + +And which one should make use of the `identity` field of user types? +Only `equiv?`, because the identity object may be a string or other +immutable but compound object. (Thinking back to my Java programs, +I've had an extremely common type that was identified by a pair of +integers, for example: document and line number. One can also +conceive of N-dimensional coordinates and other such examples.) + +We want `eqv?` to remain "pretty much constant time" with the only +exception being numbers, which seem highly unlikely to reach such a +large size that it would matter. Recursive data structures, and +uninterned strings, on the other hand, could become quite large and +costly to compare. -- cgit v1.2.3