Tuesday, December 8, 2009

The Power of First-Class Functions

Many languages now have support for first-class functions, some fully embrace them. Those that embrace them are the functional languages. That's why their called functional. There is great power when first-class functions are truly first-class.

As with any power that we are given, it is one thing to know it's there, it is another thing to understand it and yet another to use it effectively in your programs without abusing it. In this post I would like to go through a great example of this power and its correct use.

Prelude

The example is taken from the book Structure and Interpretation of Computer Programs (SICP) section 1.3.3 Procedures as General Methods. The example code is written in Clojure instead of Scheme. All of my example code is a Clojure version of the examples in that section of the book so that you will be able to compare implementations. This whole exercise is not meant to be practical but illustrative. I am not trying to write the most efficient and robust code. The main point of this post is to show a style of programming that is enabled by the skilled use of first-class functions which allows the developer to work at the correct level of abstraction and create the maximum amount of reusable code.

Before we dive in, we will need a few helper functions to simplify the code below.

(def tolerance 0.001)

(defn sin [x]
(Math/sin x))

(defn cos [x]
(Math/cos x))

(defn square [x]
(* x x))

(defn cube [x]
(* x x x))

(defn average [x y]
(/ (+ x y) 2.0))

(defn abs [x]
(if (< x 0) (- x) x))

What Are First-Class Functions?

I wish I had studied SICP as a student. It would have saved me a lot of frustration. In that book, first-class elements of a language are described as elements which may be:

   named by variables
   passed as arguments to procedures
   returned as the results of procedures
   included in data structures

Below we will look at the use of the first three features.

A Function to Find the Square Root of a Number

Imagine that we are creating some software which will need to solve various types of math problems. We may decide that we need to function to calculate the square root of a number and therefore write the following code.

(defn sqrt [x]
(loop [guess 1.0]
(let [good-enough? #(< (abs (- (square %) x)) tolerance)
improve #(average % (/ x %))]
(if (good-enough? guess)
guess
(recur (improve guess))))))

That’s not bad. It’s efficient and fairly clear. We may, at this point, decide that we are done and move on the next problem. If we program in this manner, we are going to end up with a lot more code than we actually need. The problem with this code is that there are hidden patterns which can be extracted, named and reused. Once we do this, we will not only have all of that reusable code, but we will also have a cleaner version of sqrt and more readable code in general.

Finding the Fixed Point of a Function

Since we are writing a program that is going to be solving math problems, we may also need to find the fixed point of a function. From section 1.3.3 of SICP we have the definition of a fixed point of a function: “A number x is called a fixed point of a function f if x satisfies the equation f(x) = x. For some functions f we can locate a fixed point by beginning with an initial guess and applying f repeatedly”

We can write a function that does this...

(defn fixed-point [f first-guess]
(loop [guess first-guess]
(let [next (f guess)]
(if (close-enough? guess next)
next
(recur next)))))

...and try it out by finding solutions to y = cos y and y = sin y + cos y.

; find solution to y = cos y
(fixed-point cos 1.0)
>> 0.7387603198742113

; find solution to y = sin y + cos y
(fixed-point #(+ (sin %) (cos %)) 1.0)
>> 1.259003859740025

This is a useful technique which we are sure to use throughout our software.

What does this have to do with square roots? Well, if we have a number x then we are looking for a number y such that y * y = x. If we divide both sides of this equation by y we get y = x / y. Given an x, we need to find a solution to y = x / y and we may be able to do this with our new fixed-point function.

If we try this...

(defn sqrt [x]
(fixed-point #(/ x %) 1.0))

...it will not work. If we apply that function to a value of x over and over again we just oscillate between two values forever.

Average Damping

To fix the above problem we can do this.

(defn sqrt [x]
(fixed-point #(average % (/ x %)) 1.0))

Instead of using the new value of the function as the new guess, we use the average of the old guess and the new value of the function. This is called average damping. We could again stop here. But average damping is a concept that we may use again in our software. We can extract this into a method...

(defn average-damp [f]
#(average % (f %)))

...and rewrite our sqrt function as shown below.

(defn sqrt
"solve y -> x / y to find the square root"
[x]
(fixed-point (average-damp #(/ x %)) 1.0))

We could also create a cube-root function using the same pattern but finding a solution to y = x / (y * y).

(defn cube-root
"solve y -> x / (y * y) to find the cube root"
[x]
(fixed-point (average-damp #(/ x (square %))) 1.0))

The average-damp function is a function that returns a function. You pass it a function and it returns an average damped version of that function. We now have another powerful concept that we can use anywhere thanks to first-class functions. Also, our implementation of sqrt is much better. As stated in SICP “Notice how this formulation makes explicit the three ideas in the method: fixed-point search, average-damping, and the function y -> x / y.”

Newton’s Method

If we are going to be finding solutions to equations then we will surely be using Newton’s method at some point. From SICP we have this description of Newton’s method.

“If x -> g(x) is a differentiable function, then a solution of the equation g(x) = 0 is a fixed point of the function x -> f(x) where

and Dg(x) is the derivative of g evaluated at x.”

Notice that Newton’s method will use the fixed-point function that we have created. Nice!

To implement Newton’s method we will need a derivative function. Notice that the derivative function will need to be a function that takes a function g and returns a function of x.

Here is my Clojure implementation.

(defn deriv [g]
(let [dx 0.00001]
#(/ (- (g (+ % dx)) (g %)) dx)))

We now use this to create the newton-transform function which is also a high-order function...

(defn newton-transform [g]
#(- % (/ (g %) ((deriv g) %))))

...and finally we create the newtons-method function using fixed-point.

(defn newtons-method [g guess]
(fixed-point (newton-transform g) guess))

Newton’s method can find the solution to the equation g(x) = 0. If we wanted to use Newton’s method to find square roots then we would need to find the y such that (y * y) - x = 0.

(defn sqrt
"find the fixed point of y -> (y * y) - x"
[x]
(newtons-method #(- (square %) x) 1.0))

The Final Abstraction

Notice that our two different implementations of sqrt follow a pattern: find the fixed point of a transformed function based on an initial guess. Let’s extract that pattern for reuse.

(defn fixed-point-of-transform [g transform guess]
(fixed-point (transform g) guess))

We can now use this to implement sqrt using two different functions and function transformations.

(defn sqrt [x]
(fixed-point-of-transform #(/ x %)
average-damp
1.0))

(defn sqrt [x]
(fixed-point-of-transform #(- (square %) x)
newton-transform
1.0))

These are clear implementations of sqrt using high level concepts that are reusable in many areas of our software.

Conclusion

Think about the simplicity, conciseness and maintainability of the software that you could write if you could master this kind of programming.

I have already noticed a big difference in the kind of code that I write as I have started to use Clojure. Many of the most commonly used concepts have already been extracted to functions and are included in the Clojure API. Someone who has internalized this API and knows how to use it correctly can do amazing things with very little code. For exmaple, take a look at this code from a blog post by Tim Bray entitled Idiomatic Clojure.

(defn find-widely
"Return a map of pages to hit counts in filename."
[filename]
(apply merge-with +
(pmap count-lines
(partition-all *batch-size*
(line-seq (reader filename))))))

In another post entitled Eleven Theses on Clojure Tim, who is otherwise inclined to like Clojure, writes that it is a handicap that Clojure is a Lisp and that some human minds may react poorly to the find-widely function above. This is true for a reader who is new to Clojure and not already familiar with the concepts that are being used. But for someone who has internalized the API, this is more readable than most other ways you could code this same functionality. This is made possible by embracing and mastering the use of first-class functions.

No comments:

Post a Comment