15 August 2007

One More Reason to Love Module Systems and Hygiene

I just released an Automatic Differentiation PlaneT Package, and am now converting my numerical relativity code to use it. The package exports procedures like +, -, =, etc, which are modified from the standard ones so it can compute derivatives of numerical procedures which use them. Of course, these modified procedures are slower than their standard cousins.

My numerical relativity code uses all sorts of looping constructs from the excellent SRFI-42. These loops iterate over vectors, over ranges of numbers, lists, matrices, .... The iteration macros expand to code like (this is just an illustration---the real code is more complicated than this)

(let loop ((vector-index 0))
  (if (>= vector-index (vector-length vector))
      (let ((vector-elt (vector-ref vector vector-index)))
        (loop (+ vector-index 1)))))
You'll note that this code uses +, >=, etc. It is acceptable for these functions to be the addition and comparison from my PlaneT package, since that package preserves the meaning of the numerical functions operating on pure numbers. But, it would be nicer if the + and >= came from the mzscheme language module. In this case, the compiler could inline the addition and comparison, and even if it didn't there wouldn't be the differentiation overhead.

No problem! PLT Scheme's module system takes care of this automatically: SRFI-42 is defined in the mzscheme language, so all references to + and >= in its macros refer to those bindings from mzscheme. So, even if the module using the SRFI-42 macros is written in a different language (like "deriv-lang.ss"), the expansion of these macros uses the correct procedures.

This is a combined effect of of hygiene and the PLT module system. Neither alone would reproduce the correct behavior. For example, in schemes without a module system which handles macros properly, it would be necessary to re-define + and >= at the top level in the deriv.plt package. Then, the hygiene of the macro expander would ensure that the + and >= in the macro expansion of SRFI-42 iterators referred to the top-level bindings (even if there were local bindings for these symbols at the macro use-site), but these would be the deriv-enabled + and >=, not the usual, optimizable mzscheme ones.

In short: I know of no other module system that would handle this so elegantly (in a current scheme system, anyway---the proposed R6RS module system would do just as well for this). (In fact, I think only the module system of Chez scheme, immortalized in the portable syntax-case expander by Dybvig and Waddell handles this at all. Unfortunately, it requires you to list the local variables which a macro can expand into manually along with the usual exports of the module, which is not nearly as convenient.) Hooray for PLT!

Automatic Differentiation PlaneT Package Released

I just released a PlaneT Package which, when used as a PLT Scheme module's initial language import, allows to compute derivatives of functions written in the module exactly. (I've written about the technique here and in the posts which follow.) I'm going to use this code to compute some derivatives of gravitational actions in code which would be extremely complicated to do by hand. Enjoy!

08 August 2007

Persistency and Lazy Memoization in a Pairing Heap

I just released a pairing heap PlaneT package. Because the datastructure is both persistent and has amortized space bounds, it requires a neat trick to implement, which I will discuss below.

A pairing heap is an adaptive datastructure for a collection of objects which supports the following operations [edit: You can also read about an alternative implementation here in the Scheme Cookbook. This implementation doesn't achieve the amortized bounds on remove-min which mine does in the presence of persistence]:

Operation Time Bound
min O(1)
insert O(1)
merge O(1)
remove-min O(n) worst case, O(log(n)) amortized.

The pairing heaps in my PlaneT package are persistent, which means that the above operations do not change the heap given to them, but rather construct a new heap which shares some structure with the old heap. For example, the (remove-min h) procedure returns a new heap which contains all the elements of h except for the smallest element.

Note that the time-bound on remove-min in O(n) worst-case, and O(log(n)) amortized. This means that, while a single remove-min operation can take time which is O(n), most will take much less. In fact, the longer remove-min operations are so rare that, in a sequence of an infinite number of remove-min operations, the average time per operation is O(log(n)). (This bound is only conjectured at the moment, but supported by a lot of experimental evidence. Because the pairing-heap adapts its structure to the use pattern it observes---a really neat feature, by the way---it is extremely difficult to rigorously prove bounds for it.)

It turns out to be difficult to ensure that amortized bounds still obtain with a persistent datastructure. The reason is illustrated in the following code snippet which creates a list of 1000 threads, each of which does something with a heap:

(let ((h (make-expensive-remove-min-heap)))
  (list-ec (:range i 1000)
     (lambda ()
       (let ((new-h (remove-min h)))
         (do-something-interesting-with new-h))))))
A persistent datastructure can have multiple futures! But, when proving amortized bounds, one typically exploits that the cost of an expensive operation (remove-min here) can be "accounted" to a sequence of prior cheap operations (basically, make-expensive-remove-min-heap must do a lot of inserts, each of which can "carry" a bit of the cost of the upcoming remove-min operation, so that you know that expensive remove-mins are very rare, and the cost averages out to O(log(n))). That style of accounting (often called the "Banker's Method") works if you know that only one remove-min will occur in the future of a given datastructure, as would be case if we were using ephemeral datastructures---in this case, the first remove-min would destroy the datastructure, so we could only perform it once. But we're not using ephemeral datastructures: each of the 1000 threads created above will re-perform the expensive remove-min on the same h, and we'll be in a pile of trouble, time-bounds-wise.

The solution to this problem is to use lazy evaluation with memoization. When constructing a heap, queue up the remove-min operation using delay. When you need to remove the minimum element, use force. Because delay and force memoize the result, you can guarantee that remove-min operations after the first will complete in O(1) time, which maintains the amortized bounds in the presence of persistence.

This is, of course, not my idea. It (and much more) are explained in the delightful thesis of Chris Okasaki, and also in his reportedly-delightful (I've never read it, but it's very well-regarded) book, Purely Functional Data Structures.

By the way, if you need a bunch of datastructures (more than my simple pairing-heap), please check out the Galore PlaneT library by Carl Eastlund, Jens Axel Soegaard, and Stevie Strickland (either Version 2.2---which contains more datastructures, but also more bugs and isn't as clean---or Version 3.6---which is the most modern version). It's what I use when I need a serious finite map or set.

06 August 2007

I Just Voted "Yes" on R6RS

Yep, I did. I'm including my explanation below, because I think that R6RS has come under a lot of fire lately. (By the way, for those who say, "why doesn't the standard come with a small base, and then lots of libraries attached to it?", it does! <snark>Try reading it, why don't you.</snark>)

Even if you don't agree with my choice, don't forget to vote if you signed up (just one of many reminders you'll surely get until voting closes Sunday)!

My explanation:

I would gladly trade a report full of mistakes---not that I think this is such a report, but some do---for a library system which properly handles hygienic macros. Therefore, I am voting "Yes" on R6RS. (The exception specification is just icing on the cake, as far as I'm concerned.)

Some consolation for those who do think this report is full of mistakes:

  1. I expect we'll be seeing lots of "(r6rs base)"-compliant implementations which leave off libraries they don't like, and
  2. ...in most matters it is more important that the applicable rule of law be settled than that it be settled right.
    --- Burnet v. Coronado Oil & Gas Co., 285 U.S. 393, 406--408 (1932) (Justice Brandeis dissenting).