Making Dynamic Invocation Fast: An Idea?

Evan Phoenix (of Rubinius fame) were discussing dynamic dispatch today on #rubinius, sharing our caching strategies and our dispatch woes. We talked at length about various strategies for speeding dispatch, cache invalidation mechanisms, compilation techniques, and so on. All gloriously fun stuff.

So at one point I related to him the extended version of my plans for speeding up dynamic dispatch. I relate it now to you to hopefully spur some discussion. There are no original ideas anymore, right? Or are there?

The initial experiment with Fixnum basically was the static version of my fuzzy vision. During parse time, a very trivial static table mapped method names to numbers. I only handled three method names, +, -, and <. Then those numbers were passed to Fixnum during method dispatch, where a very trivial static switch statement used them to "fast dispatch" to the appropriate methods.

The ultimate vision, however, is much grander.

The general idea is that during dynamic dispatch, we want to use the method name and the class as keys to locate a specific method instance. The typical way this is done is by keeping a hashtable on every class, and looking up methods by name. This works reasonably well, but each class must have its own table and we must pay some hashtable cost for every search. This is compounded when we consider class hierarchies, where the method we're looking for might be in a super class or a super class's super class, ad infinatum.

Retrieving the class is the easy part; every method we want to dispatch will have a receiver, and receivers have a reference to their class.

So then, how do we perform this magic mapping from M method names and N classes to a given method? Well hell, that's a matrix!

So then the great idea: build a giant matrix mapping method names (method IDs) and classes (class IDs) to the appropriate switchable value. Then when we dispatch to the class, we use that switchable value in a neatly dense switch statement.

Let me repeat that in another way to make it clear...the values in the matrix are simple "int" values, ideally as closely-numbered (dense) as possible for a given class's set of methods. In order to dispatch to a named method on a given object, we have the following steps:

Get the method ID from the method (likely calculated at parse or compile time)
Get the class ID from the class (calculated sequentially at runtime)
Index into the matrix using method ID and class ID to get the class-specific method switch value
Use the switch value to dispatch (via a switch statement, of course) to the appropriate method in the target class

Ignoring the size of this matrix for a moment, there are a number of benefits to this data structure:

We have no need for per-class or inline method caches
Repopulating the switch table for a given class's methods is just a matter of refilling its column in the matrix with appropriate values
Since the matrix represents all classes, we can avoid searching hierarchies once parents' methods are known; we just represent those methods in the matrix under the child's column and fast-dispatch to them as normal
We can save off and reconstitute the mapping to avoid having to regenerate it

...and there's also the obvious benefit: using the switch values stored in the matrix, we can do a fast dispatch on the target object with only the minimal cost of looking up a value in the matrix. An additional benefit for C code would be simply storing the address of the appropriate code in the matrix, avoiding any switch values completely.

Now naturally we're talking about a big matrix here, I'll admit that. A fellow by the name of "Defiler" piped up and said an incomplete app of his showed 1147 unique method names across 1258 classes. That works out to a matrix with 1.4M elements. In the Java world, where everything is 32 bits, that's around 6MB of memory consumed just for method mapping. Acceptable? Perhaps. But what about using a sparse matrix?

A sparse matrix attempts to be as efficient as possible when there are large numbers of "zeros" in the matrix elements. That is, interestingly enough, exactly what we have here. The vast majority of entries in our method-map would contain a zero, since most methods would not exist on most classes. Zero, as it turns out, would neatly map to our good friend "method_missing", and so dispatching to a method on a target class that doesn't implement it would continue to work as it does today. Even better, method_missing dispatch would be just as fast as a regular method dispatch, excluding the actual method_missing implementation code, of course.

So then, there it is...a potential plan for fast dynamic method invocation in JRuby (and perhaps in Rubinius, if Evan agrees the idea has merit). Any of you out there heard of such a thing? Is this an old idea? Are there problems with it? Share!

Written on December 28, 2006