1 files changed, 255 insertions, 0 deletions
diff --git a/notes/equal.md b/notes/equal.md
new file mode 100644
index 0000000..8c55faa
--- /dev/null
+++ b/notes/equal.md
@@ -0,0 +1,255 @@
+# A novel approach to object equivalence
+
+## Story time
+
+In my past 5 years of developing a warehouse management application,
+I've frequently found myself implementing equality the following way.
+
+(Code massively reduced to get to the point.  Implementations of the
+`hashCode` method, which must be compatible with `equals`, have been
+left out for brevity.)
+
+```java
+
+class Warehouse {
+  String code;
+  String name;
+  // ... more fields, many mutable
+
+  public boolean equals(Object obj) {
+    if (obj == this) {
+      return true;
+    }
+    if (obj instanceof Warehouse w) {
+      return code.equals(w.code);
+    }
+    return false;
+  }
+}
+
+class Product {
+  int id;
+  LocalizedString name;
+  LocalizedString description;
+  // ... more fields, many mutable
+
+  public boolean equals(Object obj) {
+    if (obj == this) {
+      return true;
+    }
+    if (obj instanceof Product p) {
+      return id == p.id;
+    }
+    return false;
+  }
+}
+
+```
+
+And so on.  A type may have a code that is a String, an id that is an
+int, or some other field that uniquely identifies members of it within
+our application domain.
+
+I'm speaking of types and their members, not classes and instances,
+because I mean the application domain.  The user of our program has a
+number of warehouses and a number of products, in reality, which we
+represent via these classes, but there may be multiple instances of a
+class representing the same real entity.
+
+Oftentimes, there will be a database that contains unique entries for
+these.  For example, an SQL table for products where the id is the
+primary key, and an SQL table for warehouses where the code is the
+primary key.
+
+This way of implementing `equals()` and `hashCode()` may seem wrong to
+some developers.  Since the classes have mutable fields, we may end up
+with two `Product` instances which are not equal in their state, yet
+`equals()` returns true for them since they represent the same real
+life entity, just with different state in our program.  Perhaps one
+has just been fetched from the database, whereas the other has been
+modified to represent changes that are yet to be committed.
+
+I've never found this strategy to lead to any problems.  Never have I
+had a need to know whether two `Product` instances have the same state
+right now.  What I care about is which product is being represented.
+This is useful in various ways.
+
+### Map keys, sets, and so on
+
+First of all, it allows us to create maps where the keys are instances
+of Product or Warehouse.  For example, the application may want to
+create a map of which products each warehouse contains:
+
+```java
+
+Map<Warehouse, List<Product>> productsByWarehouse;
+
+```
+
+Given the `equals()` implementation of the Warehouse class, this map
+can now be given any instance of Warehouse as a key, and we need not
+ensure that we have one canonical instance used as the key.
+
+There are of course many alternatives to this.  One may not want this
+map to keep Warehouse instances alive, in which case it would be more
+sensible to use the codes (Strings) for the keys.  One could also add
+a field to Warehouse such as `List<Product> products;`.  However, in
+some circumstances, a map such as the above may be most natural.
+
+Other data structures and operations depend on equality checks as
+well.  For example: finding duplicates in a collection, checking
+whether a collection already contains a given item, managing a set
+data type, and so on.  The way we implement equality is also useful
+for such purposes.
+
+### Elegance and encapsulation
+
+Secondly, we may have subsystems in the application that communicate
+by passing Product or Warehouse instances to each other.  For example,
+the user may be looking at a user interface displaying the products in
+a warehouse with their current quantities.  The user may then click on
+an "update quantities" button which opens a sub-interface where it's
+possible to add an arbitrary number of entries (products) with the
+desired quantity change for each.  This sub-interface may then return
+a list of Product instances with the associated quantity change.  The
+code of the warehouse overview interface can now use `equals()` to
+determine which instance of Product it received corresponds to which
+Product instance it's already holding on to.
+
+Again, there are of course many alternative strategies which don't
+require such a use of `equals()`.  One may explicitly compare the id
+fields, or the sub-interface may return ids associated with quantity
+changes.  However, the `equals()` based implementation may offer the
+cleanest possible code:
+
+```java
+
+public void receiveQuantityChanges(Map<Product, Integer> changes) {
+  for (Row<Product> row : this.productRows) {
+    Integer change = changes.get(row.product);
+    row.quantity += change;
+  }
+}
+
+```
+
+As far as readability and elegance is concerned, this can hardly be
+improved on, as far as one is familiar with Java.  Although using a
+`Map<Integer, Integer>` would hardly make the code any more verbose,
+it immediately causes it to lose out on self-documentation:
+
+```java
+
+// Quantity changes of what?  Should we rename this to the awfully
+// long name receiveProductQuantityChanges to make it clear?
+public void receiveQuantityChanges(Map<Integer, Integer> changes) {
+  for (Row<Product> row : this.productRows) {
+    Integer change = changes.get(row.product.id);
+    row.quantity += change;
+  }
+}
+
+```
+
+This is a minor change as far as readability is concerned, but it also
+decreases encapsulation.  The code needs to be aware that products are
+represented uniquely by an id that is an integer, and changing this
+implementation detail of the class may have far-reaching consequences
+throughout the code base.
+
+The self-documenting property may not apply to a Scheme-based language
+without static types, but it makes the encapsulation aspect all the
+more significant, since we won't have a compiler or static analyzer
+which immediately tells us about a change in the type of `id`.  We
+must hope that the field `id` has been removed and replaced with a
+different one, not reusing the `id` identifier.  (For example, this
+would mean a procedure like `product-id` stops existing, so a static
+analyzer or compiler can immediately warn us about its absence.  Had
+we reused the field but changed its type, we would have a very hard
+time finding all the places in the code we now need to fix.)
+
+## Get to the point!
+
+I've explained all this to arrive at one simple conclusion:
+
+Perhaps it would be a good idea for a language to have a mechanism in
+which compound types like records or classes can simply declare that
+one of the defined fields is the "unique identifier" of the type.
+
+Imitating SRFI 9:
+
+```scheme
+
+(define-record-type <product>
+  (make-product id name description ...)
+  product?
+  (identity: id)   ; <- see here
+  (id product-id)
+  (name product-name)
+  (description product-description)
+  ...)
+
+(define-record-type <warehouse>
+  (make-warehouse code name ...)
+  warehouse?
+  (identity: code)  ; <- see here
+  (code warehouse-code)
+  (name warehouse-name)
+  ...)
+
+```
+
+Now the million dollar question is whether it should be `equal?` that
+makes use of this information, or `eqv?`, or both.
+
+Although `eqv?` is intended to approximate operational equivalence,
+and should therefore not consider separate mutable objects to be the
+same, it's not clear to me how useful this is on user-defined types.
+
+The rationale for defining `eqv?` in terms of operational equivalence
+is that this allows implementing memoization.  If a memoized function
+depends on one of the mutable fields of a record, yet `eqv?` returns
+true on separate instances with different state, then the function
+will return an incorrect result.  But I'm skeptical as to whether a
+function like that would see much practical use.  It seems to me like
+a memoized function that is used to compute some property of an entity
+in our application domain should probably depend only on immutable
+properties of that object.
+
+Another way to look at `eqv?` is that it implements "mostly constant
+time" equality.  (Strictly speaking, it's a linear time operation on
+numbers, assuming the implementation offers arbitrary-size numbers.
+This rarely matters, as few programs make use of numbers larger than
+what fits in a few dozen bytes.)  Conversely, `equal?` is responsible
+for deep equality testing, which is inherently linear time, though
+record types were overlooked in its definition in R7RS-small.
+
+## "Why not neither?"
+
+(I'm coining this as a new meme, in contrast to "why not both?")
+
+I think I want Zisp to support the following:
+
+* `eq?`: As in Scheme, but maybe allowed to fail on procedures.
+* `eqv?`: As in Scheme.
+* `equiv?`: An even closer approximation of operational equivalence,
+  by diving into immutable compound data types that `eqv?` doesn't
+  handle due to trying to be constant-time.  I believe this is the
+  same as `equal-always?` in Racket.
+* `equal?`: As in Scheme, but also dives into records because why not?
+
+The reason `eq?` is allowed to fail on procedures is an optimization
+strategy that duplicates/inlines procedures.  (citation needed)
+
+And which one should make use of the `identity` field of user types?
+Only `equiv?`, because the identity object may be a string or other
+immutable but compound object.  (Thinking back to my Java programs,
+I've had an extremely common type that was identified by a pair of
+integers, for example: document and line number.  One can also
+conceive of N-dimensional coordinates and other such examples.)
+
+We want `eqv?` to remain "pretty much constant time" with the only
+exception being numbers, which seem highly unlikely to reach such a
+large size that it would matter.  Recursive data structures, and
+uninterned strings, on the other hand, could become quite large and
+costly to compare.