I just released a pairing heap PlaneT package. Because the datastructure is both persistent and has amortized space bounds, it requires a neat trick to implement, which I will discuss below.
A pairing heap is an adaptive datastructure for a collection of objects which supports the following operations [edit: You can also read about an alternative implementation here in the Scheme Cookbook. This implementation doesn't achieve the amortized bounds on remove-min
which mine does in the presence of persistence]:
Operation | Time Bound |
min | O(1) |
insert | O(1) |
merge | O(1) |
remove-min | O(n) worst case, O(log(n)) amortized. |
The pairing heaps in my PlaneT package are persistent, which means that the above operations do not change the heap given to them, but rather construct a new heap which shares some structure with the old heap. For example, the (remove-min h)
procedure returns a new heap which contains all the elements of h
except for the smallest element.
Note that the time-bound on remove-min in O(n) worst-case, and O(log(n)) amortized. This means that, while a single remove-min operation can take time which is O(n), most will take much less. In fact, the longer remove-min operations are so rare that, in a sequence of an infinite number of remove-min operations, the average time per operation is O(log(n)). (This bound is only conjectured at the moment, but supported by a lot of experimental evidence. Because the pairing-heap adapts its structure to the use pattern it observes---a really neat feature, by the way---it is extremely difficult to rigorously prove bounds for it.)
It turns out to be difficult to ensure that amortized bounds still obtain with a persistent datastructure. The reason is illustrated in the following code snippet which creates a list of 1000 threads, each of which does something with a heap:
(let ((h (make-expensive-remove-min-heap))) (list-ec (:range i 1000) (thread (lambda () (let ((new-h (remove-min h))) (do-something-interesting-with new-h))))))
remove-min
here) can be "accounted" to a sequence of prior cheap operations (basically, make-expensive-remove-min-heap
must do a lot of inserts, each of which can "carry" a bit of the cost of the upcoming remove-min
operation, so that you know that expensive remove-min
s are very rare, and the cost averages out to O(log(n))). That style of accounting (often called the "Banker's Method") works if you know that only one remove-min
will occur in the future of a given datastructure, as would be case if we were using ephemeral datastructures---in this case, the first remove-min
would destroy the datastructure, so we could only perform it once. But we're not using ephemeral datastructures: each of the 1000 threads created above will re-perform the expensive remove-min
on the same h
, and we'll be in a pile of trouble, time-bounds-wise.
The solution to this problem is to use lazy evaluation with memoization. When constructing a heap, queue up the remove-min
operation using delay
. When you need to remove the minimum element, use force
. Because delay
and force
memoize the result, you can guarantee that remove-min
operations after the first will complete in O(1) time, which maintains the amortized bounds in the presence of persistence.
This is, of course, not my idea. It (and much more) are explained in the delightful thesis of Chris Okasaki, and also in his reportedly-delightful (I've never read it, but it's very well-regarded) book, Purely Functional Data Structures.
By the way, if you need a bunch of datastructures (more than my simple pairing-heap), please check out the Galore PlaneT library by Carl Eastlund, Jens Axel Soegaard, and Stevie Strickland (either Version 2.2---which contains more datastructures, but also more bugs and isn't as clean---or Version 3.6---which is the most modern version). It's what I use when I need a serious finite map or set.
3 comments:
If you find bugs in Galore 2.2 do send them to me, so I can fix them.
(I can't remember any receiving any bug reports not fixed).
Hi Jens. I don't know of any bugs in Galore 2.2, either (as I recall, you fixed the one I reported extremely quickly), so I struck out the text above. Out of curiosity, then, why did you move to Version 3 instead of sticking with 2?
Well, think of Galore 2 and Galore 3 as seperate libraries.
The difference between Galore 2 and Galore 3 is that Galore 3 is more flexible at the cost of being slower [at least that was the case two years ago - I don't have any recent benchmarks].
Version 2 was written by me, and Galore 3 was written by Carl and Stevie. They were (are) interested in using techniques from object oriented program as a middle-layer, where as I was more fascinated by the purely functional approach. [Note: As a user of Galore 3, you see a functional interface].
Post a Comment