'Twas Brillig

It's been a long time since I posted last. I figured it was time to get at least a basic update out to folks.

JRuby's been clicking along really well. Perhaps a little too well. Our bug tracker counts 476 open bugs and climbing, even while we're trying to periodically sweep through them. We're slipping behind, but there's a silver lining: every other bug is filed by someone new. So there's happiness in slavery.

Since my last post, a lot has happened. We pushed out JRuby 1.1.3 around the middle of July, which boasted a whole bunch of new awesome. Vladimir Sizikov posted a really nice rundown of the changes.

My favorite item has to be the improved interpreter performance, which now is clearly faster than MRI's interpreter. So whether you're running interpreted or compiled in JRuby, you should be getting pretty solid straight-line performance.

There were also dozens upon dozens of compatibility fixes, an upgrade to RubyGems 1.2, and lots of other miscellaneous changes to make JRuby a little friendlier and easier to use. It's nice to be able to focus on that kind of stuff now that compatibility is mostly a done deal (ok, high 90th percentile...but pretty darn good).

So then...what's on the slate for the next JRuby release?

Making the Leap?

We had originally been talking about JRuby 1.1.3 being the last planned release in the 1.1 line. It had reached a really solid level of stability and performance, people were putting it in production for all sorts of apps, and we were generally very happy with it. "Last planned release" doesn't mean we wouldn't continue to do maintenance releases as needed...it just means we weren't going to do day-to-day development against the 1.1 line.

The primary reason for this is a number of large-scale projects we'd been putting off. For example, the "Java Integration" (JI) subsystem of JRuby has long been a serious thorn in our sides. Based off probably a dozen different people's contributions over nearly eight years, it had become an impenetrable maze of code, half in Java, half in Ruby, but all clearly in the "suck" column. And I'm not saying anything negative about the contributors who wrote all that suckage (which includes me)...they all made valiant attempts to improve the situation. But a dozen 5% solutions had led us down an ever-more-terrifying rabbit hole. We faced the ultimate decision: fix or rewrite?

We were all set to rewrite. I'd already started an experiment I called "MiniJava", a mostly code-generated replacement for JI that boasted dynamic dispatch speed no slower than Ruby to Ruby and static dispatch speed (from Java into Ruby) only about two times slower than Java calling Java. There were many levels of awesome there, but one serious problem: no tests.

JRuby's JI layer has evolved over a long period of time, and has been worked on by a mix of TDD fans and non-fans. There have also been numerous attempts to rewrite portions of the system, usually ending far short of intentions and with no new tests for functionality added along the way. As a result, any attempt to replace the JI layer would be an exercise in pain: even *we* didn't know how it all works. It's probably not exaggerating to say that existing test cases covered less than half of JI functionality, and that the current JRuby team probably understood less than half of JI's implementation. Not a great place to start from.

Faced with the certainty of pain and knowing, workaholic that I am, that a large portion of the rewrite would probably fall on my shoulders, I decided to give it one more go. I decided to attempt an in-place refactoring of JRuby's Java Integration layer, tens of thousands of lines of spaghetti Java and Ruby code.

Redemption

I've had false starts before. The compiler had at least two partial attempts that failed to go anywhere. I've rewritten the JRuby interpreter several times. Hell, I can't count the number of times I tried to refactor JRuby's IO subsystem before finally succeeding. But JI...man, that is some seriously heinous code. So I had to start small. How about something I was already intimately familiar with: method dispatch.

puts "Measure bytelist appends (via Java integration)"
5.times { 
  puts Benchmark.measure { 
    sb = org.jruby.util.ByteList.new 
    foo = org.jruby.util.ByteList.plain("foo") 
    1000000.times { 
      sb.append(foo) 
    } 
  }
}

puts "Measure string appends (via normal Ruby)"
5.times { 
  puts Benchmark.measure { 
    str = "" 
    foo = "foo" 
    1000000.times { 
      str << foo 
    } 
  } 
}

Here's one of our JI benchmarks. The idea here is that since JRuby's String type is backed by a ByteList, appending to a ByteList through Java integration should be roughly equivalent in execution cost to appending to a String. Or at least, that would be the ideal situations. It was not, however, the case in JRuby 1.1.3.

JRuby 1.1.3

Measure bytelist appends (via Java integration)
  3.580000   0.000000   3.580000 (  3.579149)
  2.551000   0.000000   2.551000 (  2.551251)
  2.615000   0.000000   2.615000 (  2.614461)
  2.505000   0.000000   2.505000 (  2.505265)
  2.715000   0.000000   2.715000 (  2.715380)
Measure string appends (via normal Ruby)
  0.290000   0.000000   0.290000 (  0.289629)
  0.247000   0.000000   0.247000 (  0.246666)
  0.259000   0.000000   0.259000 (  0.259266)
  0.250000   0.000000   0.250000 (  0.250354)
  0.253000   0.000000   0.253000 (  0.253113)

Ouch. Ten times worse? Sure, I could see it being a couple times worse; after all, going from Ruby code into a Ruby type and back out to Ruby code should be faster than Ruby to Java to Ruby, since we're talking about leaving the controlled world of JRuby core and coming back. But not ten times worse.

Armed with this and a few other benchmarks, I started attacking the dispatch path for Java calls. Largely the optimizations made were the same ones we'd done over the past year for Ruby code: avoid boxing arguments in arrays when possible; clean up and simplify overloaded method selection; as much as possible eliminate constructing any objects not directly used in the eventual reflected call. And early returns started to look really good. Once the basics of the new call logic were in place, performance started to look a lot better.

JRuby trunk

Measure bytelist appends (via Java integration)
  1.490000   0.000000   1.490000 (  1.489711)
  0.658000   0.000000   0.658000 (  0.657639)
  0.646000   0.000000   0.646000 (  0.645717)
  0.638000   0.000000   0.638000 (  0.637849)
  0.603000   0.000000   0.603000 (  0.602439)
Measure string appends (via normal Ruby)
  0.312000   0.000000   0.312000 (  0.312046)
  0.232000   0.000000   0.232000 (  0.231591)
  0.243000   0.000000   0.243000 (  0.242401)
  0.235000   0.000000   0.235000 (  0.235499)
  0.232000   0.000000   0.232000 (  0.231688)

That's more like it...less than three times slower than one of our fastest Ruby-based calls. Other benchmarks showed similar improvement.

JRuby 1.1.3

Measure Integer.valueOf, overloaded call with a primitive
  2.384000   0.000000   2.384000 (  2.383484)
  2.167000   0.000000   2.167000 (  2.167232)
  2.191000   0.000000   2.191000 (  2.191187)
  2.212000   0.000000   2.212000 (  2.211901)
  2.202000   0.000000   2.202000 (  2.202460)

JRuby trunk

Measure Integer.valueOf, overloaded call with a primitive
  0.635000   0.000000   0.635000 (  0.635259)
  0.470000   0.000000   0.470000 (  0.470315)
  0.471000   0.000000   0.471000 (  0.471354)
  0.468000   0.000000   0.468000 (  0.467723)
  0.468000   0.000000   0.468000 (  0.467589)

Here's an example of where improving overload selection made a huge difference. Previously, every time Ruby code called an overloaded Java method, we built up an ArrayList of all argument types, used that to get an aggregate hashcode, used the hashcode to see whether there was a cached previous match, and otherwise went through a brute-force search to match incoming types to outgoing Java signatures.

Wait a second. We constructed an ArrayList just to get a hashcode based on its contents? That dog won't hunt, Monsignor.

So I replaced that logic with a five-line method that aggregates the type's hashcodes into an uber-hashcode, which is then used as the cache key. Combine that with specific-arity searches (avoiding argument boxing in arrays) and search logic that understands Ruby objects (avoiding pre-coercing each argument to potentially the wrong type), and hey, we're starting to see the light.

Keeping it Real

Once it became apparent that a refactoring was most definitely possible, even if it took some hard work and a lot of dedication, we decided to put out a 1.1.4 release, focusing primarily on Java integration, but as always including peripheral bug fixes and performance improvements. Having made that decision, the JI job suddenly became a lot more important. I vowed that 1.1.3 would be the last release to contain a JI layer we were silently afraid of.

Then there was the matter of existing users. Since the beginning of the year, more and more folks have been branching out from the core Rails world that had been JRuby's bread and butter into "Java scripting" sorts of applications. Probably the most prominent ones are the team at Happy Camper Studios, who not only built a real-world-practical Swing framework for JRuby (see MonkeyBars) and a library for packaging up JRuby-based Ruby apps as single-file executables (see Rawr), but who also were pushing the boundaries of what Ruby and JRuby are capable of (see RailGun, and probably see a lot more in the near future). And to top it off, they were releasing MonkeyBars-based apps commercially...making their living depending on JRuby's JI layer.

Would it be a good decision to leave them with the old code for six months while we do a rewrite? I'll let you think about that for a bit.

More Results

Another area that needed serious work was juggling Ruby and Java arrays. The logic for coercing a Ruby array into a Java array was implemented half in Ruby and half in Java, neither half being very efficient in themselves. Add to that the constant back and forth, and you have a recipe for disaster. With another multi-day effort, I managed to wrestle the code to one side of the fence, refactor it, and improve performance over twenty times.

Code

require 'java'
require 'benchmark'

TIMES = (ARGV[0] || 5).to_i

TIMES.times do
  Benchmark.bm(30) do |bm|
    bm.report("control") {a = [1,2,3,4]; 100_000.times {a}}
    bm.report("ary.to_java") {a = [1,2,3,4]; 100_000.times {a.to_java}}
    bm.report("ary.to_java :object") {a = [1,2,3,4]; 100_000.times {a.to_java :object}}
    bm.report("ary.to_java :string") {a = [1,2,3,4]; 100_000.times {a.to_java :string}}
  end
end

JRuby 1.1.3
                                    user     system      total        real
control                         0.013000   0.000000   0.013000 (  0.013130)
ary.to_java                     7.523000   0.000000   7.523000 (  7.522787)
ary.to_java :object             7.794000   0.000000   7.794000 (  7.794777)
ary.to_java :string             9.905000   0.000000   9.905000 (  9.905805)

JRuby trunk
                                    user     system      total        real
control                         0.009000   0.000000   0.009000 (  0.009548)
ary.to_java                     0.240000   0.000000   0.240000 (  0.239946)
ary.to_java :object             0.248000   0.000000   0.248000 (  0.247385)
ary.to_java :string             0.418000   0.000000   0.418000 (  0.418230)

You can find similar improvements on trunk for object construction, interface implementation, and several other areas. Work continues, but performance is starting to look way better.

Don't You Think About Anything But Performance?

Of course a side effect of simplifying the code for performance reasons is that fixing bugs and adding features becomes a lot easier. Check out a tiny bit of new logic that works now on JRuby trunk:

# "closure conversion" was only supported for instance methods before
thread = java.lang.Thread.new { puts 'Wahoo!' }
thread.start
thread.join
# output: 'Wahoo!'

Of course this is a somewhat contrived example, but there are some obviously useful examples too:

javax.swing.SwingUtilities.invoke_later { puts "Yay, Swing!" }
# script terminates, but the Swing event thread keeps running until it's fired our block

The ability to pass a block to "any method" that accepted an interface as its last parameter was only functional for instance methods in all previous releases of JRuby. After the refactoring, expanding it to constructors and static methods took about 5 minutes. With tests. We'll come back to that in a moment.

The simplification of the code also means we'll probably be able to fix a bunch of Java Integration bugs we've punted on for months. For example, it's been a long-standing bug that you can't implement a Java interface in Ruby by defining the underscored versions of that interface's method names. But now, with newly rewritten JI interface-implementation code, it should be a snap to add that feature. We're planning to do a JI bug audit for 1.1.4 and knock down as many long-standing issues as we can.

Proving It Works

So back to the tests for a moment.

Some months ago we managed to get a baseline suite of RSpec specs into the JRuby development process thanks to Nick Sieger. Initially, the tests were very sparse, only a few specific things Nick had a chance to put together. But as interested community members started sending in patches, we started to grow a nice little suite.

As I've been working on the refactoring, I've been trying to "test along" as I learn how bits of JRuby's JI layer function. And so I've added a number of new specs for type coercion, interface implementation, method dispatch and overload selection, and others. Ola Bini came out of hiding to contribute a pretty comprehensive set of Array specs, which were a great help during the slaying of the Array-coercion dragon. So we're finally getting that suite we'd always needed, and once the refactoring is done we'll be in a far better position to start taking JI to the next level.

What's Next?

At the moment the refactoring is maybe 25% along. I'm the only one working on it, and it's a crapload of code, so it's going to take a little time. But most key performance bottlenecks have been remedied at this point.

There are a few areas that remain to be tackled:

Refactoring the extensive logic governing how Ruby classes can extend abstract or concrete Java classes. Kresten Krab Thorup contributed this well over a year ago, and it's kinda been an island unto itself. It will take a considerable effort to rework.

Moving the remaining core JI logic from Ruby into Java. I know, turtles and all that...but I've seen enough projects try to do large-scale refactorings in Ruby to know that the tools and techniques simply aren't there yet. By moving this logic into Java, where it belongs, we'll probably be able to delete 90% of it. That means more room for performance and features, and a much higher likelihood that future enhancements can safely live in Ruby without incurring a severe performance penalty.

Eliminating the old "lower level" and "higher level" Java integration layers. Originally, JRuby's JI was implemented as a set of reflection-like Ruby classes wrapping Java's reflection classes (which comprised the "lower level") and a substantial amount of Ruby code that juggled these reflected bits to represent Java types and make Java calls (the so-called "higher level"). While conceptually this makes some sense, in practice it meant that any call from Ruby to Java actually ended up as dozens, maybe hundreds of Ruby invocations before the target Java method could be invoked. Performance improvements over the past four years were able to improve matters, but largely the only substantial gains have come from, and will continue to come from, eliminating the two separate levels entirely.

I intend for all of this to be in place for JRuby 1.1.4, which we want to release this month.

Next Time on Headius: Tune in for my next post, hopefully soon, about two Rubinius APIs we've added for 1.1.4: Multiple VMs (MVM) and Foreign Function Interface (FFI).

Written on August 7, 2008