This "Larger Example" I’m going to write about is in chapter 18, starting on page 243. The chapter is (supposed to be) about the object oriented features of OCaml (they would be the "O" part) but this example is more objectionable than objective. It was intended to be an implementation of the Levenshtein distance algorithm: it started with an incomplete description of that algorithm and a figure of a matrix with four mistakes. Then it got worse.
Smith’s plan was to create a base class containing the algorithm and then use subclasses to adapt the algorithm to different data types. But when I looked at the edit_distance
base class I noticed that half of the core algorithm was missing! I found it in the string_edit_distance
subclass, and also in the list_edit_distance
subclass. Martin Fowler in his great "Refactoring" book says: Number one in the stink parade is duplicated code. If you see the same code structure in more than one place, you can be sure that your program will be better if you find a way to unify them.
Following Fowler’s good advice, which Smith should have done, I moved the duplicated code to the base class. There was only a small bit of code that was different for each subclass: the part that compares the elements of the particular data structures (strings and lists in this example) to work out if a changing cost is required. I created a cost_to_change
method to handle that.
Then I tried to simplify the code and make it more readable.
In the update_matrix
method I removed three of the let bindings and passed the calculation results directly the trimin
function I mentioned before.
In the gen_matrix
method I replaced the pattern matching and nested iteration functions with two separate iteration functions, which hopefully makes it clear that we are only initialising the first row and first column; most of the matrix is left untouched.
In the calc
method, which I’d moved from the subclass to the base class, I again replaced the pattern matching and nested iteration functions, this time with nested for loops. Pattern matching and higher-order iteration functions are very powerful, and they could be used instead of loops and conditional statements in all ocaml code; but ocaml includes for loops and if statements because they are easier to use and understand than the more powerful features. Choose the right tool for the job.
As a result of all these changes the size of a subclass has been reduced from 18 lines to 6, only 2 of which need to be changed for each data type (the other 4 are structure). You can have a look at some of Smith’s code and my modified version to see the differences for yourself.