Thursday, August 12, 2010

Clojure Protocols and the Expression Problem

The official release of Clojure 1.2 is just around the corner. One of the most significant changes in this release is the addition of protocols and datatypes. Below, we will explore one way that these features may be used to improve your Clojure programs. Specifically, we will look at how protocols may be used to avoid the Expression Problem.

According to Wikipedia, the term "Expression Problem" was coined by Philip Wadler:

“The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).”

We experience this problem when we want to add functionally to library code that is outside of our control. In the Java world, wrappers are commonly used to adapt a class to an interface. The problem with this approach is that the identity of the thing has changed and now our system has to deal with two types for the same thing. In the Ruby world, with its open classes, monkey patching may be used to add functions to a class. Let's face it, anything with the word monkey in it can’t be good. Monkey patching changes the class for everyone and can cause unforeseen problems. Both of these approaches add incidental complexity to our code which is the exact thing that we are trying to avoid by using Clojure.

In this post we will go over some working code that is intended to simulate a real life situation where we may use protocols to avoid the Expression Problem. The code is located in the protocol-examples project in the module named expresssion-problem. If you would like to follow along:

$ git clone git://github.com/brentonashworth/protocol-examples.git
$ cd protocol-examples/expression-problem
$ lein deps && lein javac && lein test com.corp.employee-test
$ open src/clj/com/corp/employee.clj

Simulating an integration problem


Imagine that we are creating a system to work with employees. We have created a library, shown below, which performs various payroll and benefit calculations on employee data. As is the custom of Clojure people, employees are represented as maps so that they gain all of the benefits of being "generically manipulable".

(ns com.corp.employee)

(defn- bonus [years performance]
  (* (+ (* years 500)
        1000)
     (/ performance 10)))

(defn- earned-vacation [years]
  (+ 10 (* years 2)))

(defn- vacation-value [rate years]
  (* rate (* 8 (earned-vacation years))))

(defn payroll [employee hours]
  (* hours
     (:rate employee)))

(defn total-payroll
  [coll hours-map]
  (reduce (fn [total next]
            (let [[{rate :rate} hours] (val next)]
              (+ total (* rate hours))))
          0
          (merge-with conj (group-by :name coll) hours-map)))

(defn employee-bonus [e]
  (apply bonus ((juxt :years :perf) e)))

(defn employee-vacation-value [e]
  (apply vacation-value ((juxt :rate :years) e)))

(defn total-emp-benefits [employee]
  (+ (employee-vacation-value employee)
     (employee-bonus employee)))

(defn total-benefits [coll]
  (reduce + (map total-emp-benefits coll)))

(defn make-employee
  [name years perf rate]
  {:name name :years years :perf perf :rate rate})

The problem


Now imagine that we have a new requirement. The existing payroll system will need to be able to use our new library to calculate benefits. They happen to be using Clojure as well but the payroll system is written in Java and cannot be changed. Their system has a Java class named com.company.Employee which contains the information that we need but does not conform to our interface. Specifically, they will need to be able to call total-emp-benefits and total-benefits passing instances of com.company.Employee.

We may start to solve this problem by creating some tests that demonstrate what we would like our system to do (if you are following along, these tests are already included in com.corp.integration-test).

(ns com.corp.integration-test
  (use (clojure test)
       (com.corp employee))
  (import com.company.Employee))

(def e1 (make-employee "jim" 2 8 20))
(def e2 {:name "sue" :years 15 :perf 6 :rate 60})
(def e3 (Employee. "james" 2 8 30M))
(def employees [e1 e2 e3])

(deftest test-employee
  (is (= (.calculatePayroll e3 40)
         1200M)))

(deftest test-total-emp-benefits
  (is (= (total-emp-benefits e3)
         4960M)))

(deftest test-total-benefits
  (is (= (total-benefits employees)
         33100M)))
In these tests we create two employee maps and one employee using the closed Employee class from the payroll system. Our first test is a sanity check; calling one of the methods that exists in Employee. The other two tests pass instances of Employee to functions in our system. The last two tests will not pass because our current implementation expects plain Clojure maps.

Whatever our solution is, we want to ensure that we do not change the caller's contract. There is a lot of code in production that is using our library and we don't want to force our callers to have to change their code.

What is our contract? We have six reporting functions that are designed to work with maps of employee data. We also have a constructor function that provides a more compact way to creating these maps. Our callers may or may not be using this constructor. The tests that we have in place are testing the caller's contract so if we can get all of the tests working without making any changes to them then we have been successful.

Clojure provides two solutions to this problem: multimethods and protocols. Multimethods dispatch on a function of their arguments, protocol functions dispatch on the type of their first argument. As long as we are happy with the form of dispatch, protocols provide the additional benefits of interface organization and faster dispatch. We will choose to use protocols for our solution.

Adding a protocol


We start by defining the Benefits protocol. We choose the functions from our current system that calculate benefits and take an employee as the first argument.

(defprotocol Benefits
 (employee-bonus [this])
 (employee-vacation-value [this]))
Because these functions are defined in our protocol, we can no longer use them as standalone functions. We must implement them for each datatype that we would like them to support. To start with, we want to ensure that Clojure maps continue to work. We use extend-protocol to implement the functions for maps using our existing implementations. If you are following along, add the following form and remove the functions employee-bonus and employee-vacation-value.

(extend-protocol Benefits
  clojure.lang.IPersistentMap
  (employee-bonus [this]
                  (apply bonus ((juxt :years :perf) this)))
  (employee-vacation-value [this]
                           (apply vacation-value ((juxt :rate :years) this))))
After making these changes our original tests should still work.

$ lein test com.corp.employee-test

Next, we solve the problem that we set out to solve by extending the type com.company.Employee with the Benefits protocol. The final extend-protocol form is shown below.

(extend-protocol Benefits
  
  clojure.lang.IPersistentMap
  (employee-bonus [this] (bonus (:years this)
                                (:perf this)))
  (employee-vacation-value [this] (vacation-value (:rate this)
                                                  (:years this)))

  com.company.Employee
  (employee-bonus [this] (bonus (.getYearsWithCompany this)
                                (.getCurrentPerformanceRating this)))
  (employee-vacation-value [this] (vacation-value (.getHourlyRate this)
                                                  (.getYearsWithCompany this))))

Now we may run all of our tests and... Congratulations! Everything works. From the callers perspective, we have added the functions employee-bonus and employee-vacation-value to the Employee class. Notice that we have not globally changed this class, these functions are only available within the context of the com.corp.employee namespace.

Improving performance


Because we have chosen to use protocols, we can make one final improvement to our library. We can create a record based on our protocol and then update the make-employee function to create instances of this record instead of creating maps. This will increase application performance. First we create the new record.

(defrecord StandardBenefits [name years perf rate]
  Benefits
  (employee-bonus [this] (bonus years perf))
  (employee-vacation-value [this] (vacation-value rate years)))
Next, update the make-employee function to create an instance of StandardBenefits instead of a map.

(defn make-employee [name years perf rate]
 (StandardBenefits. name years perf rate))

Run all the tests to confirm that everything works.

Notice that even the functions payroll and total-payroll continue to work when they are passed instances of StandardBenefits even thought they are not part of our protocol and are expecting plain Clojure maps. They work because defrecord provides a complete implementation of a persistent map which is the main advantage of using it over deftype.

If you were not following along then you may want to have a look at the finished version.

Conclusion


The most important thing about what we have done here is what we didn't do. We made no change to com.company.Employee, we didn't change our reporting functions or even the functions payroll and total-payroll which depend on the persistent map abstraction, and we made no change to the caller's contract.

For more information about Clojure's protocols see the datatypes and protocols pages on the Clojure web site. There is also a very informative presentation, "Clojure 1.2 Protocols", by Stuart Halloway.

3 comments:

  1. Great article. What is the difference between defn and defn-?

    ReplyDelete
  2. Thank you.

    defn- creates a private function. In this example these functions don't need to be private. I wanted to emphasize the public contract. Normally, any function that might be useful to the outside world is left as public.

    In Clojure, private functions are not visible to the library user by default but you can still get to them if you need to. For example if there is a private in com.company.namespace named private-function we can map that into our current namespace like this:

    (def private-function
    (ns-resolve 'com.company.namespace
    'private-function))

    ReplyDelete