I just released a pairing heap PlaneT package. Because the datastructure is both persistent and has amortized space bounds, it requires a neat trick to implement, which I will discuss below.
A pairing heap is an adaptive datastructure for a collection of objects which supports the following operations [edit: You can also read about an alternative implementation here in the Scheme Cookbook. This implementation doesn't achieve the amortized bounds on
remove-min which mine does in the presence of persistence]:
|remove-min||O(n) worst case, O(log(n)) amortized.|
The pairing heaps in my PlaneT package are persistent, which means that the above operations do not change the heap given to them, but rather construct a new heap which shares some structure with the old heap. For example, the
(remove-min h) procedure returns a new heap which contains all the elements of
h except for the smallest element.
Note that the time-bound on remove-min in O(n) worst-case, and O(log(n)) amortized. This means that, while a single remove-min operation can take time which is O(n), most will take much less. In fact, the longer remove-min operations are so rare that, in a sequence of an infinite number of remove-min operations, the average time per operation is O(log(n)). (This bound is only conjectured at the moment, but supported by a lot of experimental evidence. Because the pairing-heap adapts its structure to the use pattern it observes---a really neat feature, by the way---it is extremely difficult to rigorously prove bounds for it.)
It turns out to be difficult to ensure that amortized bounds still obtain with a persistent datastructure. The reason is illustrated in the following code snippet which creates a list of 1000 threads, each of which does something with a heap:
(let ((h (make-expensive-remove-min-heap))) (list-ec (:range i 1000) (thread (lambda () (let ((new-h (remove-min h))) (do-something-interesting-with new-h))))))
remove-minhere) can be "accounted" to a sequence of prior cheap operations (basically,
make-expensive-remove-min-heapmust do a lot of inserts, each of which can "carry" a bit of the cost of the upcoming
remove-minoperation, so that you know that expensive
remove-mins are very rare, and the cost averages out to O(log(n))). That style of accounting (often called the "Banker's Method") works if you know that only one
remove-minwill occur in the future of a given datastructure, as would be case if we were using ephemeral datastructures---in this case, the first
remove-minwould destroy the datastructure, so we could only perform it once. But we're not using ephemeral datastructures: each of the 1000 threads created above will re-perform the expensive
remove-minon the same
h, and we'll be in a pile of trouble, time-bounds-wise.
The solution to this problem is to use lazy evaluation with memoization. When constructing a heap, queue up the
remove-min operation using
delay. When you need to remove the minimum element, use
force memoize the result, you can guarantee that
remove-min operations after the first will complete in O(1) time, which maintains the amortized bounds in the presence of persistence.
This is, of course, not my idea. It (and much more) are explained in the delightful thesis of Chris Okasaki, and also in his reportedly-delightful (I've never read it, but it's very well-regarded) book, Purely Functional Data Structures.
By the way, if you need a bunch of datastructures (more than my simple pairing-heap), please check out the Galore PlaneT library by Carl Eastlund, Jens Axel Soegaard, and Stevie Strickland (either Version 2.2---which contains more datastructures
, but also more bugs and isn't as clean---or Version 3.6---which is the most modern version). It's what I use when I need a serious finite map or set.