Is Reflection Really as Fast as Direct Invocation?

This was originally posted to the jruby-devel mailing list, but I am desperate to be proven wrong here. We use reflection extensively to bind Ruby methods to Java impls in JRuby, and the rumors of how fast reflection is have always bothered me. What is the truth? Certainly there are optimizations that make reflection very fast, but as fast as INVOKEINTERFACE and friends? Show me the numbers! Prove me wrong!!

--

It has long been assumed that reflection is fast, and that much is true. The JVM has done some amazing things to make reflected calls really f'n fast these days, and for most scenarios they're as fast as you'd ever want them to be. I certainly don't know the details, but the rumors are that there's code generation going on, reflection calls are actually doing direct calls, the devil and souls are involved, and so on. Many stories, but not a lot of concrete evidence.

A while back, I started playing around with a "direct invocation method" in JRuby. Basically, it's an interface that provides an "invoke" method. The idea is that for every Ruby method we provide in Java code you would create an implementation of this interface; then when the time comes to invoke those methods, we are doing an INVOKEINTERFACE bytecode rather than a call through reflection code.

The down side is that this would create a class for every Ruby method, which amounts to probably several hundred classes. That's certainly not ideal, but perhaps manageable considering you'd have JRuby loaded once in a whole JVM for all uses of it. It could also be mitigated by only doing this for heavily-hit methods. Still, requiring lots of punky little classes is a big deal. [OT: Oh what I would give for delegates right about now...]

The up side, or so I hoped, would be that a straight INVOKEINTERFACE would be faster than a reflected call, regardless of any optimization going on, and we wouldn't have to do any wacked-out code generation.

Initial results seemed to agree with the upside, but in the long term nothing seemed to speed up all that much. There's actually a number of these "direct invocation methods" still in the codebase, specifically for a few heavily-hit String methods like hash, [], and so on.

So I figured I'd resolve this question once and for all in my mind. Is a reflected call as fast as this "direct invocation"?

A test case is attached. I ran the loops for ten million invocations...then ran them again timed, so that hotspot could do its thing. The results are below for both pure interpreter and hotspotted runs (time are in ms).

Hotspotted:
first time reflected: 293
second time reflected: 211
total invocations: 20000000
first time direct: 16
second time direct: 8
total invocations: 20000000

Interpreted:
first time reflected: 9247
second time reflected: 9237
total invocations: 20000000
first time direct: 899
second time direct: 893
total invocations: 20000000

I would really love for someone to prove me wrong, but according to this simple benchmark, direct invocation is faster--way, way faster--in all cases. It's obviously way faster when we're purely interpreting or before hotspot kicks in, but it's even faster after hotspot. I made both invocations increment a static variable, which I'm hoping prevented hotspot from optimizing code into oblivion. However even if hotspot IS optimizing something away, it's apparent that it does a better job on direct invocations. I know hotspot does some inlining of code when it's appropriate to do so...perhaps reflected code is impossible to inline?

Anyone care to comment? I wouldn't mind speeding up Java-native method invocations by a factor of ten, even if it did mean a bunch of extra classes. We could even selectively "directify" methods, like do everything in Kernel and Object and specific methods elsewhere.

--

The test case was attached to my email...I include the test case contents here for your consumption.

private static interface DirectCall {
public void call();
}

public static class DirectCallImpl implements DirectCall {
public static int callCount = 0;
public void call() { callCount += 1; }
}

public static DirectCall dci = new DirectCallImpl();

public static int callCount = 0;
public static void call() { callCount += 1; }

public void testReflected() {
try {
Method callMethod = getClass().getMethod("call", new Class[0]);

long time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
callMethod.invoke(null, null);
}
System.out.println("first time reflected: " + (System.currentTimeMillis() - time));
time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
callMethod.invoke(null, null);
}
System.out.println("second time reflected: " + (System.currentTimeMillis() - time));
System.out.println("total invocations: " + callCount);
} catch (Exception e) {
e.printStackTrace();
assertTrue(false);
}
}

public void testDirect() {
long time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
dci.call();
}
System.out.println("first time direct: " + (System.currentTimeMillis() - time));
time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
dci.call();
}
System.out.println("second time direct: " + (System.currentTimeMillis() - time));
System.out.println("total invocations: " + DirectCallImpl.callCount);
}


Update: A commenter noticed that the original code was allocating a new Object[0] for every call to the reflected method; that was a rather dumb mistake on my part. The commenter also noted that I was doing a direct call to the impl rather than a call to the interface, which was also true. I updated the above code and re-ran the numbers, and reflection does much better as a result...but still not as fast as the direct call:

Hotspotted:

first time reflected: 146
second time reflected: 109
total invocations: 20000000
first time direct: 15
second time direct: 8
total invocations: 20000000

Interpreted:

first time reflected: 6560
second time reflected: 6565
total invocations: 20000000
first time direct: 912
second time direct: 920
total invocations: 20000000
Written on July 9, 2006