28 April 2007

Why Use the FFI for Simple Numerical Operations in MzScheme?

Brad Lucier (of Numerical Partial Differential Equations in Scheme fame) posted a comment on my last post announcing the code for numerical manipulation of Lorentz group elements which reads:

I'm surprised that you decided to use the C FFI to manipulate 4x4 matrices; they are so small that it would seem better just to use mzscheme's numerical facilities which are getting better all the time.

This is an interesting question to answer, so I'm going to devote this post to answering it instead of leaving a short response in the comments section.

Before writing this code (which uses some C to do the manipulations), I wrote some code which used purely Scheme to perform the following sequence of operations, given a set of parameters for an SO(3,1) element:

  1. Compute the matrix associated with the parameters, M.
  2. Invert that matrix to produce another matrix M-inv.
  3. Compute the parameters associated with M-inv, p-inv.
This is a natural sequence of operations to perform on the group---it's just the group inverse function on the parameters. The timings for 10K repeats of this operation are:
Code Style Total CPU Time GC Time
Pure Scheme51 sec27 sec
Mixed Scheme/C4 sec2.8 sec
I think the large GC time for the Scheme/C code is probably due to the fact that each call goes through several contract checks, which probably dominate the run-time and allocation. Clearly, there's still an advantage to doing significant manipulations in C! I wish that it was practical to stay completely within PLT Scheme for doing this sort of work, but I'm afraid that it isn't. I suspect that Brad's implementation of choice (I assume), Gambit, would do much better on this sort of comparison.

To philosophize: the role I see for Scheme in my computational world is as an excellent driver to glue a bunch of simple C or Fortran computational elements together. (This isn't my idea---have a look at this for a similar approach with Python.) PLT Scheme is my implementation of choice for this because of its excellent FFI, it's extensive libraries (particularly its support of software contracts), and it's relatively easy C functions for memory allocation and Scheme object manipulation. The contracts, in particular, are a big help, because they let me avoid doing any argument checking on the C side (where I want to minimize coding). Memory allocation is nice, because with a combination of mzc --xform and scheme_malloc[_atomic], I never again have to mess with malloc and free in the C that I write. I've written this code with an eye toward including it in my thesis work, which will also be written in this style, so it seemed natural to do the manipulations in C.

This is, of course, my view. People like Brad, Marc, Manuel, and Jeff have a different vision, and have done tremendous work in optimizing Scheme performance on numeric codes. I've tried that approach, and always find myself back with PLT's scheme and C.

4 comments:

Jens Axel Søgaard said...

Hi Will,

It would be fun to see the Scheme code. Then we could see how well it runs on Gambit - or whether there is a way to speed it up.

Unknown said...
This comment has been removed by the author.
Unknown said...
This comment has been removed by the author.
Will Farr said...

Jens,

There's a lot of ways to speed up the Scheme code. Two off the top of my head: (1) use LU decomposition instead of Gaussian elimination for the matrix inversion and (2) eliminate a lot of places where I have cons-ed unnecessarily because I was too lazy to go to the trouble of re-using some previously allocated storage. I'll post it here in a few minutes (at the least, some people may find a Gaussian elimination matrix inversion routine in Scheme, however inefficient, to be useful).

However, there's a bigger point here, which is that LAPACK already does this, better than I could. (Better means both more efficiently, and also more stably in the presence of roundoff error, which is a non-trivial task.) And, it's really not that painful to perform numerical manipulations on arrays of double-precision numbers from C. As soon as the datastructures get more complicated, C is a pain in the ass, but for such simple stuff, Scheme really doesn't hold any advantage. So, I often find it more useful to drop down to C for this kind of stuff, and then handle all the complicated stuff from Scheme. For example, if you look at the code for the SO31.plt package, you'll see that the C does *only* math---even memory allocation is handled from Scheme.

Anyway, the code is coming, and thanks for the comment!

Will