Thursday, April 2, 2009

How JRuby Makes Ruby Fast

At least once a year there's a maelstrom of posts about a new Ruby implementation with stellar numbers. These numbers are usually based on very early experimental code, and they are rarely accompanied by information on compatibility. And of course we love to see crazy performance numbers, so many of us eat this stuff up.

Posting numbers too early is a real disservice to any project, since they almost certainly don't represent the eventual real-world performance people will see. It encourages folks to look to the future, but it also marginalizes implementations that already provide both compatibility and performance, and ignores how much work it has taken to get there. Given how much we like to see numbers, and how thirsty the Ruby community is for "a fastest Ruby", I don't know whether this will ever change.

I thought perhaps a discussion about the process of optimizing JRuby might help folks understand what's involved in building a fast, compatible Ruby implementation, so that these periodic shootouts don't get blown out of proportion. Ruby can be fast, certainly even faster than JRuby is today. But getting there while maintaining compatibility is very difficult.

Performance Optimization, JRuby-style

The truth is it's actually very easy to make small snippits of Ruby code run really fast, especially if you optimize for the benchmark. But is it useful to do so? And can we extrapolate eventual production performance from these early numbers?

We begin our exploration by running JRuby in interpreted mode, which is the slowest way you can run JRuby. We'll be using the "tak" benchmark, since it's simple and easy to demonstrate relative performance at each optimization level.
# Takeuchi function performance, tak(24, 16, 8)
def tak x, y, z
if y >= x
return z
else
return tak( tak(x-1, y, z),
tak(y-1, z, x),
tak(z-1, x, y))
end
end

require "benchmark"

N = (ARGV.shift || 1).to_i

Benchmark.bm do |make|
N.times do
make.report do
i = 0
while i<10
tak(24, 16, 8)
i+=1
end
end
end
end

And here's our first set of results. I have provided Ruby 1.8.6 and Ruby 1.9.1 numbers for comparison.
Ruby 1.8.6p114:
➔ ruby bench/bench_tak.rb 5
user system total real
17.150000 0.120000 17.270000 ( 17.585128)
17.170000 0.140000 17.310000 ( 17.946869)
17.180000 0.160000 17.340000 ( 18.234570)
17.180000 0.150000 17.330000 ( 17.779536)
18.790000 0.190000 18.980000 ( 19.560232)

Ruby 1.9.1p0:
➔ ruby191 bench/bench_tak.rb 5
user system total real
3.570000 0.030000 3.600000 ( 3.614855)
3.570000 0.030000 3.600000 ( 3.615341)
3.560000 0.020000 3.580000 ( 3.608843)
3.570000 0.020000 3.590000 ( 3.591833)
3.570000 0.020000 3.590000 ( 3.640205)

JRuby 1.3.0-dev, interpreted, client VM
➔ jruby -X-C bench/bench_tak.rb 5
user system total real
24.981000 0.000000 24.981000 ( 24.903000)
24.632000 0.000000 24.632000 ( 24.633000)
25.459000 0.000000 25.459000 ( 25.459000)
29.122000 0.000000 29.122000 ( 29.122000)
29.935000 0.000000 29.935000 ( 29.935000)

Ruby 1.9 posts some nice numbers here, and JRuby shows how slow it can be when doing no optimizations at all. The first change we look at, and which we recommend to any users seeking best-possible performance out of JRuby, is to use the JVM's "server" mode, which optimizes considerably better.
JRuby 1.3.0-dev, interpreted, server VM
➔ jruby --server -X-C bench/bench_tak.rb 5
user system total real
8.262000 0.000000 8.262000 ( 8.192000)
7.789000 0.000000 7.789000 ( 7.789000)
8.012000 0.000000 8.012000 ( 8.012000)
7.998000 0.000000 7.998000 ( 7.998000)
8.000000 0.000000 8.000000 ( 8.000000)

The "server" VM differs from the default "client" VM in that it will optimistically inline code across calls and optimize the resulting code as a single unit. This obviously allows it to eliminate costly x86 CALL operations, but even more than that it allows optimizing algorithms which span multiple calls. By default, OpenJDK will attempt to inline up to 9 levels of calls, so long as they're monomorphic (only one valid target), not too big, and no early assumptions are changed by later code (like if a monomorphic call goes polymorphic later on). In this case, where we're not yet compiling Ruby code to JVM bytecode, this inlining is mostly helping JRuby's interpreter, core classes, and method-call logic. But already we're 3x faster than interpreted JRuby on the client VM.

The next optmization will be to turn on the compiler. I've modified JRuby for the next couple runs to *only* compile and not do any additional optimizations. We'll discuss those optimizations as I add them back.
JRuby 1.3.0-dev, compiled (unoptimized), server VM:
➔ jruby --server -J-Djruby.astInspector.enabled=false bench/bench_tak.rb 5
user system total real
5.436000 0.000000 5.436000 ( 5.376000)
3.655000 0.000000 3.655000 ( 3.655000)
3.662000 0.000000 3.662000 ( 3.662000)
3.683000 0.000000 3.683000 ( 3.683000)
3.668000 0.000000 3.668000 ( 3.668000)

By compiling, without doing any additional optimizations, we're able to improve performance 2x again. Because we're now JITing Ruby code as JVM bytecode, and the JVM eventually JITs JVM bytecode to native code, our Ruby code actually starts to benefit from the JVM's built-in optimizations. We're making better use of the system CPU and not making nearly as many calls as we would from the interpreter (since the interpreter is basically a long chain of calls for each low-level Ruby operation.

Next, we'll turn on the simplest and oldest JRuby compiler optimization, "heap scope elimination".
JRuby 1.3.0-dev, compiled (heap scope optz), server VM:
➔ jruby --server bench/bench_tak.rb 5
user system total real
4.014000 0.000000 4.014000 ( 3.942000)
2.776000 0.000000 2.776000 ( 2.776000)
2.760000 0.000000 2.760000 ( 2.760000)
2.769000 0.000000 2.769000 ( 2.769000)
2.768000 0.000000 2.768000 ( 2.769000)

The "heap scope elimination" optimization eliminates the use of an in-memory store for local variables. Instead, when there's no need for local variables to be accessible outside the context of a given method, they are compiled as Java local variables. This allows the JVM to put them into CPU registers, making them considerably faster than reading or writing them from/to main memory (via a cache, but still slower than registers). This also makes JRuby ease up on the JVM's memory heap, since it no longer has to allocate memory for those scopes on every single call. This now puts us comfortably faster than Ruby 1.9, and it represents the set of optimizations you see in JRuby 1.2.0.

Is this the best we can do? No, we can certainly do more, and some such experimental optimizations are actually already underway. Let's continue our exploration by turning on another optimization similar to the previous one: "backtrace-only frames".
JRuby 1.3.0-dev, compiled (heap scope + bracktrace frame optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true bench/bench_tak.rb 5
user system total real
3.609000 0.000000 3.609000 ( 3.526000)
2.600000 0.000000 2.600000 ( 2.600000)
2.602000 0.000000 2.602000 ( 2.602000)
2.598000 0.000000 2.598000 ( 2.598000)
2.602000 0.000000 2.602000 ( 2.602000)

Every Ruby call needs to store information above and beyond local variables. There's the current "self", the current method visibility (used for defining new methods), which class is currently the "current" one, backref and lastline values ($~ and $_), backtrace information (caller's file and line), and some other miscellany for handling long jumps (like return or break in a block). In most cases, this information is not used, and so storing it and pushing/popping it for every call wastes precious time. In fact, other than backtrace information (which needs to be present to provide Ruby-like backtrace output), we can turn most of the frame data off. This is where we start to break Ruby a bit, though there are ways around it. But you can see we get another small boost.

What if we eliminate frames entirely and just use the JVM's built-in backtrace logic? It turns out that having any pushing/popping of frames, even with only backtrace data, still costs us quite a bit of performance. So let's try "heap frame elimination":
JRuby 1.3.0-dev, compiled (heap scope + heap frame optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true bench/bench_tak.rb 5
user system total real
2.955000 0.000000 2.955000 ( 2.890000)
1.904000 0.000000 1.904000 ( 1.904000)
1.843000 0.000000 1.843000 ( 1.843000)
1.823000 0.000000 1.823000 ( 1.823000)
1.813000 0.000000 1.813000 ( 1.813000)

By eliminating frames entirely, we're a good 33% faster than the fastest "fully framed" run you'd get with stock JRuby 1.2.0. You'll notice the command line here is the same; that's because we're venturing into more and more experimental code, and in this case I've actually forced "frameless" to be "no heap frame" instead of "backtrace-only heap frame". And what do we lose with this change? We no longer would be able to produce a backtrace containing only Ruby calls, so you'd see some JRuby internals in the trace, similar to how Rubinius shows Rubinius internals. But we're getting respectably fast now.

Next up we'll turn on some optimizations for math operators.
JRuby 1.3.0-dev, compiled (heap scope, heap frame, fastops optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true -J-Djruby.compile.fastops=true bench/bench_tak.rb 5
user system total real
2.291000 0.000000 2.291000 ( 2.225000)
1.335000 0.000000 1.335000 ( 1.335000)
1.337000 0.000000 1.337000 ( 1.337000)
1.344000 0.000000 1.344000 ( 1.344000)
1.346000 0.000000 1.346000 ( 1.346000)

Most of the time, when calling + or - on an object, we do the full Ruby dynamic dispatch cycle. Dispatch involves retrieving the target object's metaclass, querying for a method (like "+" or "-"), and invoking that method with the appropriate arguments. This works fine for getting us respectable performance, but we want to take things even further. So JRuby has experimental "fast math" operations to turn most Fixnum math operators into static calls rather than dynamic ones, allowing most math operations to inline directly into the caller. And what do we lose? This version of "fast ops" makes it impossible to override Fixnum#+ and friends, since whenever we call + on a Fixnum it's going straight to the code. But it gets us another nearly 30% improvement.

Up to now we've still also been updating a lot of per-thread information. For every line, we're tweaking a per-thread field to say what line number we're on. We're also pinging a set of per-thread fields to handle the unsafe "kill" and "raise" operations on each thread...basically we're checking to see if another thread has asked the current one to die or raise an exception. Let's turn all that off:

JRuby 1.3.0-dev, compiled (heap scope, heap frame, fastops, threadless, positionless optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true -J-Djruby.compile.fastops=true -J-Djruby.compile.positionless=true -J-Djruby.compile.threadless=true bench/bench_tak.rb 5
user system total real
2.256000 0.000000 2.256000 ( 2.186000)
1.304000 0.000000 1.304000 ( 1.304000)
1.310000 0.000000 1.310000 ( 1.310000)
1.307000 0.000000 1.307000 ( 1.307000)
1.301000 0.000000 1.301000 ( 1.301000)

We get a small but measurable performance boost from this change as well.

The experimental optimizations up to this point (other than threadless) comprise the set of options for JRuby's --fast option, shipped in 1.2.0. The --fast option additionally tries to statically inspect code to determine whether these optimizations are safe. For example, if you're running with --fast but still access backrefs, we're going to create a frame for you anyway.

We're not done yet. I mentioned earlier the JVM gets some of its best optimizations from its ability to profile and inline code at runtime. Unfortunately in current JRuby, there's no way to inline dynamic calls. There's too much plumbing involved. The upcoming "invokedynamic" work in Java 7 will give us an easier path forward, making dynamic calls as natural to the JVM as static calls, but of course we want to support Java 5 and Java 6 for a long time. So naturally, I have been maintaining an experimental patch that eliminates most of that plumbing and makes dynamic calls inline on Java 5 and Java 6.
JRuby 1.3.0-dev, compiled ("--fast", dyncall optz), server VM:
➔ jruby --server --fast bench/bench_tak.rb 5
user system total real
2.206000 0.000000 2.206000 ( 2.066000)
1.259000 0.000000 1.259000 ( 1.259000)
1.258000 0.000000 1.258000 ( 1.258000)
1.269000 0.000000 1.269000 ( 1.269000)
1.270000 0.000000 1.270000 ( 1.270000)

We improve again by a small amount, always edging the performance bar higher and higher. In this case, we don't lose compatibility, we lose stability. The inlining modification breaks method_missing and friends, since I have not yet modified the call pipeline to support both inlining and method_missing. And there's still a lot of extra overhead here that can be eliminated. But in general we're still mostly Ruby, and even with this change you can run a lot of code.

This represents the current state of JRuby. I've taken you all the way from slow, compatible execution, through fast, compatible execution, and all the way to faster, less-compatible execution. There's certainly a lot more we can do, and we're not yet as fast as some of the incomplete experimental Ruby VMs. But we run Ruby applications, and that's no small feat. We will continue making measured steps, always ensuring compatibility first so each release of JRuby is more stable and more complete than the last. If we don't immediately leap to the top of the performance heap, there's always good reasons for it.

Performance Optimization, Duby-style

As a final illustration, I want to show the tak performance for a language that looks like Ruby, and tastes like Ruby, but boasts substantially better performance: Duby.
def tak(x => :fixnum, y => :fixnum, z => :fixnum)
unless y < x
z
else
tak( tak(x-1, y, z),
tak(y-1, z, x),
tak(z-1, x, y))
end
end

puts "Running tak(24,16,8) 1000 times"

i = 0
while i<1000
tak(24, 16, 8)
i+=1
end

This is the Takeuchi function written in Duby. It looks basically like Ruby, except for the :fixnum type hints in the signature. Here's a timing of the above script (which calls tak the same as before but 1000 times instead of 5 times), running on the server JVM:
➔ time jruby -J-server bin/duby examples/tak.duby
Running tak(24,16,8) 1000 times

real 0m13.657s
user 0m14.529s
sys 0m0.450s

So what you're seeing here is that Duby can run "tak(24,16,8)", the same function we tested in JRuby above, in an average of 0.013 seconds--nearly two orders of magnitude faster than the fastest JRuby optimizations above and at least an order of magnitude faster than the fastest incomplete, experimental implementations of Ruby. What does this mean? Absolutely nothing, because Duby is not Ruby. But it shows how fast a Ruby-like language can get, and it shows there's a lot of runway left for JRuby to optimize.

Be a (Supportive) Critic!

So the next time someone posts an article with crazy-awesome performance numbers for a Ruby implementation, by all means applaud the developers and encourage their efforts, since they certainly deserve credit for finding new ways to optimize Ruby. But then ask yourself and the article's author how much of Ruby the implementation actually supports, because it makes a big difference.

Update, April 4: Several people told me I didn't go quite far enough in showing that by breaking Ruby you could get performance. And after enough cajoling, I was convinced to post one last modification: recursion optimization.
JRuby 1.3.0-dev, compiled ("--fast", dyncall optz, recursion optz), server VM:
➔ jruby --server --fast bench/bench_tak.rb 5
user system total real
0.524000 0.000000 0.524000 ( 0.524000)
0.338000 0.000000 0.338000 ( 0.338000)
0.325000 0.000000 0.325000 ( 0.325000)
0.299000 0.000000 0.299000 ( 0.299000)
0.310000 0.000000 0.310000 ( 0.310000)

Woah! What the heck is going on here? In this case, JRuby's compiler has been hacked to turn recursive "functional calls", i.e. calls to an implicit "self" receiver, into direct calls. The logic behind this is that if you're calling the current method from the current method, you're going to always dispatch back to the same piece of code...so why do all the dynamic call gymnastics? This fits a last piece into the JVM inlining-optimization puzzle, allowing mostly-recursive benchmarks like Takeuchi to inline more of those recursive calls. What do we lose? Well, I'm not sure yet. I haven't done enough testing of this optimization to know whether it breaks Ruby in some subtle way. It may work for 90% of cases, but fail for an undetectable 10%. Or it may be something we can determine statically, or something for which we can add an inexpensive guard. Until I know, it won't go into a release of JRuby, at least not as a default optimization. But it's out there, and I believe we'll find a way.

It is also, incidentally, only a few times slower than a pure Java version of the same benchmark, provided Java is using all boxed numerics too.

Saturday, March 28, 2009

On Benchmarking

Sigh. It must be that time of year again. Another partially-completed Ruby implementation has started to get overhyped because of early performance numbers.

MacRuby has been mentioned on this blog before. It's a reimplementation of Ruby 1.9 targeting the Objective-C runtime--and now, targeting LLVM for immediately compiling Ruby code to native code. Initial performance results running some of my benchmark show an interesting mixed bag. For some, MacRuby's new "experimental" branch performs very well, in some cases a few times faster than JRuby. For others, performance is slow enough there must be something wrong. And there's a large number of my benchmarks that don't even run, due to broken features they'll be fixing over the next several months.

And yet, at least one Rubyist has already seen fit to declare MacRuby "the fastest Ruby implementation around". Really? When it's crashing for about half the scripts I ran and extremely slow for many others?

He bases this assertion on running the benchmarks MacRuby includes in its own repository. Because MacRuby usually performs much better on those benchmarks than Ruby 1.9, he has decided they're now "the fastest Ruby". Do we have to do this hype dance every year?

Look, I know I'm biased. I want JRuby to be the best Ruby implementation possible. I want it to be fast, and if possible, the fastest. I also want it to run existing Ruby applications and integrate well with Java libraries and applications and continue to be one of the best choices for running Ruby. So I can understand that it sounds like I'm throwing stones by pouring water on such a breathless proclamation as "fastest Ruby implementation around". But seriously guys...haven't we learned anything?

MacRuby's experimental branch is just that: experimental. Lots of stuff is fast, but lots of stuff is broken or slow. I'm sure the MacRuby guys are going to get everything resolved and working, and I'll admit these early results drive me to work on JRuby performance even harder. But I also know from experience that many of the missing features are exactly those that make Ruby performance a really difficult problem. That's why we've always focused on compatibility first (almost to a fault); it's really easy to paint yourself into a corner.

But this post isn't about MacRuby. They're doing awesome work, and I have no doubt at least some of the performance numbers will stick. This post is about the evils of benchmarking, especially prematurely.

Around this time last year, MagLev (Ruby based on the Gemstone VM) posted some crazy benchmarks and shocked the Ruby world at RailsConf. They had numbers even more stunning than MacRuby, running some simple numerical benchmarks orders of magnitude faster than either Ruby 1.8 or 1.9. Several Ruby bloggers immediately posted not just their enthusiasm, but their belief that MagLev had won the performance battle without ever firing a shot.

And I believe it was a great disservice to the MagLev team.

MagLev was, last spring, a very primitive and early implementation. It could run some useful Ruby code, but the majority of the core classes had not yet been implemented and very little work had been done on compatibility. Now we're approaching a year later, and MagLev is still in development, still closed source, still at a private alpha stage of life. Again, I'll admit I'm biased, so I need to state that I believe MagLev is also a really cool technology, at least as cool as MacRuby or JRuby. In many ways and for many domains both of them are going to be more compelling than JRuby, and I have no illusions that JRuby will never get leapfrogged in performance. But we need to remember a really important fact: these implementations are not done.

I could post blog entries with every experimental branch of JRuby I've ever tested. I could show you "fib" numbers 3-5 times faster than current JRuby and 10 times faster than Ruby 1.9. But honestly, what would be the point? I know it's experimental, I know we need to get there in a careful, measured way, and I know that my best experiments may never be reflected in real-world, real-application performance. And yet it seems like people just love to latch on to these early contenders, hyping them to death almost before they're out of the starting gate.

Listen, people: Ruby is hard to implement. Oh, it may look easy at a glance, and you can probably get 70, 80, or even 90% of the way pretty quickly. But there's some crazy stuff in that last 10% or 5% that totally blindsides you if you're not looking for it. An early Ruby implementation has not run that last mile of Ruby implementation, and it takes almost as much work to get there as it does to run the first 90%.

So let's try to be adults about this and give new implementations time to actually finish before we whip the community into a frenzy. Every time we go overboard in our declarations, we look like amateurs. And as certain as I am that MacRuby is going to be a major contender for the "fastest Ruby" crown, I think we'd be wise to hold judgment until it and other young Ruby implementations are actually finished.

BiteScript 0.0.1 - A Ruby DSL for JVM Bytecode

I have finally released the first version of BiteScript, my little DSL for generating JVM bytecode. Install it as a gem with "gem install bitescript".
require 'bitescript'

include BiteScript

fb = FileBuilder.build(__FILE__) do
public_class "SimpleLoop" do
public_static_method "main", void, string[] do
aload 0
push_int 0
aaload
label :top
dup
aprintln
goto :top
returnvoid
end
end
end

fb.generate do |filename, class_builder|
File.open(filename, 'w') do |file|
file.write(class_builder.generate)
end
end

BiteScript grew out of my work on Duby. I did not want to call directly into a Java bytecode API like ASM, so I wrapped it with a nice Ruby-like layer. I also wanted the option of having blocks of bytecode look like raw assembly, but also callable as an API.

Currently only two projects I know of make use of BiteScript: Duby and the upcoming Ruby-to-Java "compiler2" in JRuby, which will also be released as a gem.

For a longer example, you can look at tool/compiler2.rb in JRuby, lib/duby/jvm/jvm_compiler.rb in Duby, or an example implementation of Fibonacci in BiteScript.

I'm open to suggestions for how to improve the API, and I'd also like to add the missing Java 5 features. The better BiteScript works, the better Duby and "compiler2" will work.

For folks interested in using BiteScript, the JVM Specification is an easy-to-read complete reference for targeting the JVM, and here is my favorite JVM opcode quickref.

Thursday, March 12, 2009

More Compiling Ruby to Java Types

I did another pass on compiler2, and managed to wire in signature support. So let's look at a couple examples:
class MyRubyClass
def helloWorld
puts "Hello from Ruby"
end
def goodbyeWorld(a)
puts a
end

signature :helloWorld, [] => Java::void
signature :goodbyeWorld, [java.lang.String] => Java::void
end

In this case we have our friend MyRubyClass once again, with helloWorld and goodbyeWorld methods. You'll recall from my previous post that these two methods originally compiled as returning IRubyObject, and goodbyeWorld compiled as receiving a single IRubyObject parameter.

But with signature support, things are so much cooler! The two "signature" lines at the bottom of the class (syntax and structure are totally up for debate) associated signatures with the two methods. helloWorld receives no parameters and has a void return type. goodbyeWorld receives a single String parameter and has a void return type.

The compiler takes this new information, and produces a more normal-looking set of Java signatures:
Compiled from "MyObject.java.rb"
public class MyObject extends org.jruby.RubyObject{
static {};
public MyObject();
public void helloWorld();
public void goodbyeWorld(java.lang.String);
}

Huzzah! There's almost nothing here to give away that we're actually dealing with Ruby code under the covers. And the code that consumes this is just as simple:
public class MyObjectTest {
public static void main(String[] args) {
MyObject obj = new MyObject();
obj.helloWorld();
obj.goodbyeWorld("hello");
}
}

And that's literally all there is to it. Here's a more advanced example:
class MyRubyClass
%w[boolean byte short char int long float double].each do |type|
java_type = Java.send type
eval "def #{type}Method(a); a; end"
signature "#{type}Method", [java_type] => java_type
end
end

This time we're actually *generating* the methods, looping over a list of Java primitives and eval'ing a method for each. So this is *runtime* generation of methods, like any good Rubyist loves to do. And of course, this is absolutely no problem for compiler2:
Compiled from "MyObject2.java.rb"
public class MyObject2 extends org.jruby.RubyObject{
static {};
public MyObject2();
public double doubleMethod(double);
public int intMethod(int);
public char charMethod(char);
public short shortMethod(short);
public boolean booleanMethod(boolean);
public float floatMethod(float);
public long longMethod(long);
public byte byteMethod(byte);
}

All the methods are there, just as you'd expect them! Fantastic!!! (Though the ordering is a little peculiar; I think that's because we don't have an ordered method table in our class impl. Does it matter?)

Even better, the above methods are doing the same type coercion on the way in and out that we do for any other Java-based method calling. So your integral numerics are presented to Ruby as Fixnums, floating-point numerics are Floats, and booleans come through as Ruby true or false.

There's certainly more work to be done:
  • There's no support for overloads at the moment, but I'll likely provide a method aliasing facility so you can define multiple Ruby methods and then say which one maps to which overload. And of course, you'll be able to define multiple overloads that go to the same method body if you wish.
  • I also have not wired in varargs, but it will be an easy match to Ruby's restargs. And optional arguments could automatically generate different-arity Java signatures.
  • Annotations will also be trivial to add; it's just a matter of attaching appropriate metadata and having compiler2 emit them. So you'll be able to use JavaEE 5, JUnit4, and any other frameworks that depend on having annotations present.
Of course this is all checked into JRuby trunk, so feel free to give it a try. Stop by JRuby mailing lists or IRC if you have questions. And it's all still written in Ruby; signature support bloated the compiler up to a whopping 178 lines of code, most of that for dealing with the JVM opcodes for primitive types.

This is just the beginning!

Tuesday, March 10, 2009

Compiling Ruby to Java Types

"Compiler #2" as it has been affectionately called is a compiler to turn normal Ruby classes into Java classes, so they can be constructed and called by normal Java code. When I asked for 1.3 priorities, this came out way at the top. Tom thought perhaps I asked for trouble putting it on the list, and he's probably right (like asking "prioritize these: sandwich, pencil, shiny gold ring with 5kt diamond, banana"), but I know this has been a pain point for people.

I have just landed an early prototype of the compiler on trunk. I made a few decisions about it today:
  • It will use my bytecode DSL "BiteScript", just like Duby does
  • It will use the *runtime* definition of a class to generate the Java version
The second point is an important one. Instead of having an offline compiler that inspects a file and generates code from it, the compiler will actually used the runtime class to create a Java version. This means you'll be able to use all the usual metaprogramming facilities, and at whatever point the compiler picks up your class it will see all those methods.

Here's an example:

# myruby.rb
require 'rbconfig'

class MyRubyClass
def helloWorld
puts "Hello from Ruby"
end
if Config::CONFIG['host_os'] =~ /mswin32/
def goodbyeWorld(a)
puts a
end
else
def nevermore(*a)
puts a
end
end
end

Here we have a class that defines two methods. The first, always defined, is helloWorld. The second is conditionally either goodbyeWorld or nevermore, based on whether we're on Windows. Yes, it's a contrived example...bear with me.

The compiler2 prototype can be invoked as follows (assuming bitescript is checked out into ../bitescript):

jruby -I ../bitescript/lib/ tool/compiler2.rb MyObject MyRubyClass myruby

A breakdown of these arguments is as follows:
  • -I ../bitescript/lib includes bitescript
  • tool/compiler2.rb is the compiler itself
  • MyObject is the name we'd like the Java class to have
  • MyRubyClass is the name of the Ruby class we want it to front
  • myruby is the library we want it to require to load that class
Running this on OS X and dumping the resulting Java class gives us:

Compiled from "MyObject.java.rb"
public class MyObject extends org.jruby.RubyObject{
static {};
public MyObject();
public org.jruby.runtime.builtin.IRubyObject helloWorld();
public org.jruby.runtime.builtin.IRubyObject nevermore(org.jruby.runtime.builtin.IRubyObject[]);
}

The first thing to notice is that the compiler has generated a method for nevermore, since I'm not on Windows. I believe this will be unique among dynamic languages on the JVM: we will make the *runtime* set of methods available through the Java type, not just the static set present at compile time.

Because there are no type signatures specified for MyRubyClass, all types have defaulted to IRubyObject. Type signature logic will come along shortly. And notice also this extends RubyObject; a limitation of the current setup is that you won't be able to use compiler2 to create subclasses. That will come later.

Once you've run this, you've got a MyObject that can be instantiated and used directly. Behind the scenes, it uses a global JRuby instance, so JRuby's still there and you still need it in classpath, but you won't have to instantiate a runtime, pass it around, and so on. It should make integrating JRuby into Java frameworks that want a real class much easier.

So, thoughts? Questions? Have a look at the code under tool/compiler2.rb in JRuby's repository. The entire compiler is so far only 78 lines of Ruby code.