diff options
Diffstat (limited to 'html/notes/equal.md')
| -rw-r--r-- | html/notes/equal.md | 255 |
1 files changed, 0 insertions, 255 deletions
diff --git a/html/notes/equal.md b/html/notes/equal.md deleted file mode 100644 index 8c55faa..0000000 --- a/html/notes/equal.md +++ /dev/null @@ -1,255 +0,0 @@ -# A novel approach to object equivalence - -## Story time - -In my past 5 years of developing a warehouse management application, -I've frequently found myself implementing equality the following way. - -(Code massively reduced to get to the point. Implementations of the -`hashCode` method, which must be compatible with `equals`, have been -left out for brevity.) - -```java - -class Warehouse { - String code; - String name; - // ... more fields, many mutable - - public boolean equals(Object obj) { - if (obj == this) { - return true; - } - if (obj instanceof Warehouse w) { - return code.equals(w.code); - } - return false; - } -} - -class Product { - int id; - LocalizedString name; - LocalizedString description; - // ... more fields, many mutable - - public boolean equals(Object obj) { - if (obj == this) { - return true; - } - if (obj instanceof Product p) { - return id == p.id; - } - return false; - } -} - -``` - -And so on. A type may have a code that is a String, an id that is an -int, or some other field that uniquely identifies members of it within -our application domain. - -I'm speaking of types and their members, not classes and instances, -because I mean the application domain. The user of our program has a -number of warehouses and a number of products, in reality, which we -represent via these classes, but there may be multiple instances of a -class representing the same real entity. - -Oftentimes, there will be a database that contains unique entries for -these. For example, an SQL table for products where the id is the -primary key, and an SQL table for warehouses where the code is the -primary key. - -This way of implementing `equals()` and `hashCode()` may seem wrong to -some developers. Since the classes have mutable fields, we may end up -with two `Product` instances which are not equal in their state, yet -`equals()` returns true for them since they represent the same real -life entity, just with different state in our program. Perhaps one -has just been fetched from the database, whereas the other has been -modified to represent changes that are yet to be committed. - -I've never found this strategy to lead to any problems. Never have I -had a need to know whether two `Product` instances have the same state -right now. What I care about is which product is being represented. -This is useful in various ways. - -### Map keys, sets, and so on - -First of all, it allows us to create maps where the keys are instances -of Product or Warehouse. For example, the application may want to -create a map of which products each warehouse contains: - -```java - -Map<Warehouse, List<Product>> productsByWarehouse; - -``` - -Given the `equals()` implementation of the Warehouse class, this map -can now be given any instance of Warehouse as a key, and we need not -ensure that we have one canonical instance used as the key. - -There are of course many alternatives to this. One may not want this -map to keep Warehouse instances alive, in which case it would be more -sensible to use the codes (Strings) for the keys. One could also add -a field to Warehouse such as `List<Product> products;`. However, in -some circumstances, a map such as the above may be most natural. - -Other data structures and operations depend on equality checks as -well. For example: finding duplicates in a collection, checking -whether a collection already contains a given item, managing a set -data type, and so on. The way we implement equality is also useful -for such purposes. - -### Elegance and encapsulation - -Secondly, we may have subsystems in the application that communicate -by passing Product or Warehouse instances to each other. For example, -the user may be looking at a user interface displaying the products in -a warehouse with their current quantities. The user may then click on -an "update quantities" button which opens a sub-interface where it's -possible to add an arbitrary number of entries (products) with the -desired quantity change for each. This sub-interface may then return -a list of Product instances with the associated quantity change. The -code of the warehouse overview interface can now use `equals()` to -determine which instance of Product it received corresponds to which -Product instance it's already holding on to. - -Again, there are of course many alternative strategies which don't -require such a use of `equals()`. One may explicitly compare the id -fields, or the sub-interface may return ids associated with quantity -changes. However, the `equals()` based implementation may offer the -cleanest possible code: - -```java - -public void receiveQuantityChanges(Map<Product, Integer> changes) { - for (Row<Product> row : this.productRows) { - Integer change = changes.get(row.product); - row.quantity += change; - } -} - -``` - -As far as readability and elegance is concerned, this can hardly be -improved on, as far as one is familiar with Java. Although using a -`Map<Integer, Integer>` would hardly make the code any more verbose, -it immediately causes it to lose out on self-documentation: - -```java - -// Quantity changes of what? Should we rename this to the awfully -// long name receiveProductQuantityChanges to make it clear? -public void receiveQuantityChanges(Map<Integer, Integer> changes) { - for (Row<Product> row : this.productRows) { - Integer change = changes.get(row.product.id); - row.quantity += change; - } -} - -``` - -This is a minor change as far as readability is concerned, but it also -decreases encapsulation. The code needs to be aware that products are -represented uniquely by an id that is an integer, and changing this -implementation detail of the class may have far-reaching consequences -throughout the code base. - -The self-documenting property may not apply to a Scheme-based language -without static types, but it makes the encapsulation aspect all the -more significant, since we won't have a compiler or static analyzer -which immediately tells us about a change in the type of `id`. We -must hope that the field `id` has been removed and replaced with a -different one, not reusing the `id` identifier. (For example, this -would mean a procedure like `product-id` stops existing, so a static -analyzer or compiler can immediately warn us about its absence. Had -we reused the field but changed its type, we would have a very hard -time finding all the places in the code we now need to fix.) - -## Get to the point! - -I've explained all this to arrive at one simple conclusion: - -Perhaps it would be a good idea for a language to have a mechanism in -which compound types like records or classes can simply declare that -one of the defined fields is the "unique identifier" of the type. - -Imitating SRFI 9: - -```scheme - -(define-record-type <product> - (make-product id name description ...) - product? - (identity: id) ; <- see here - (id product-id) - (name product-name) - (description product-description) - ...) - -(define-record-type <warehouse> - (make-warehouse code name ...) - warehouse? - (identity: code) ; <- see here - (code warehouse-code) - (name warehouse-name) - ...) - -``` - -Now the million dollar question is whether it should be `equal?` that -makes use of this information, or `eqv?`, or both. - -Although `eqv?` is intended to approximate operational equivalence, -and should therefore not consider separate mutable objects to be the -same, it's not clear to me how useful this is on user-defined types. - -The rationale for defining `eqv?` in terms of operational equivalence -is that this allows implementing memoization. If a memoized function -depends on one of the mutable fields of a record, yet `eqv?` returns -true on separate instances with different state, then the function -will return an incorrect result. But I'm skeptical as to whether a -function like that would see much practical use. It seems to me like -a memoized function that is used to compute some property of an entity -in our application domain should probably depend only on immutable -properties of that object. - -Another way to look at `eqv?` is that it implements "mostly constant -time" equality. (Strictly speaking, it's a linear time operation on -numbers, assuming the implementation offers arbitrary-size numbers. -This rarely matters, as few programs make use of numbers larger than -what fits in a few dozen bytes.) Conversely, `equal?` is responsible -for deep equality testing, which is inherently linear time, though -record types were overlooked in its definition in R7RS-small. - -## "Why not neither?" - -(I'm coining this as a new meme, in contrast to "why not both?") - -I think I want Zisp to support the following: - -* `eq?`: As in Scheme, but maybe allowed to fail on procedures. -* `eqv?`: As in Scheme. -* `equiv?`: An even closer approximation of operational equivalence, - by diving into immutable compound data types that `eqv?` doesn't - handle due to trying to be constant-time. I believe this is the - same as `equal-always?` in Racket. -* `equal?`: As in Scheme, but also dives into records because why not? - -The reason `eq?` is allowed to fail on procedures is an optimization -strategy that duplicates/inlines procedures. (citation needed) - -And which one should make use of the `identity` field of user types? -Only `equiv?`, because the identity object may be a string or other -immutable but compound object. (Thinking back to my Java programs, -I've had an extremely common type that was identified by a pair of -integers, for example: document and line number. One can also -conceive of N-dimensional coordinates and other such examples.) - -We want `eqv?` to remain "pretty much constant time" with the only -exception being numbers, which seem highly unlikely to reach such a -large size that it would matter. Recursive data structures, and -uninterned strings, on the other hand, could become quite large and -costly to compare. |
