More Compiler Strategy: Call Adapters and Stack-based Methods
Compilers are hard. But not so hard as people would have you believe.
I've committed an update that installs a CallAdapter for every compiled call site. CallAdapter is basically a small object that stores the following:
The end result is that while compiled class init is a bit larger (needs to load adapters for all call sites), compiled method size has dropped substantially; in compiling bench_method_dispatch.rb, the two main tests went from 4000 and 3500 bytes of code down to 1500 and 1000 bytes (roughly). And simpler code means HotSpot has a better chance to optimize.
Here's the latest numbers for the bench_method_dispatch_only test, which just measures time to call a Ruby-implemented method a bunch of times:
I have also made a lot of progress on adapting the compiler to create stack-based methods when possible. Basically, this involved inspecting the code for anything that would require access to local variables outside the body of the call. Things like eval, closures, etc. At the moment it works well and passes all tests, but I know methods similar to gsub which modify $~ or $_ are not working right. It's disabled at the moment, pending more work, but here's the method dispatch numbers with stack-based method compilation enabled:
Oh, and for those who always need a fib fix, here's fib with both optimizations turned on:
I've committed an update that installs a CallAdapter for every compiled call site. CallAdapter is basically a small object that stores the following:
- method name
- method index
- call type (normal, functional, variable)
The end result is that while compiled class init is a bit larger (needs to load adapters for all call sites), compiled method size has dropped substantially; in compiling bench_method_dispatch.rb, the two main tests went from 4000 and 3500 bytes of code down to 1500 and 1000 bytes (roughly). And simpler code means HotSpot has a better chance to optimize.
Here's the latest numbers for the bench_method_dispatch_only test, which just measures time to call a Ruby-implemented method a bunch of times:
Test interpreted: 100k loops calling self's foo 100 timesAnd Ruby 1.8.6 for reference:
2.383000 0.000000 2.383000 ( 2.383000)
2.691000 0.000000 2.691000 ( 2.691000)
1.775000 0.000000 1.775000 ( 1.775000)
1.812000 0.000000 1.812000 ( 1.812000)
1.789000 0.000000 1.789000 ( 1.789000)
1.776000 0.000000 1.776000 ( 1.777000)
1.809000 0.000000 1.809000 ( 1.809000)
1.779000 0.000000 1.779000 ( 1.781000)
1.784000 0.000000 1.784000 ( 1.784000)
1.830000 0.000000 1.830000 ( 1.830000)
Test interpreted: 100k loops calling self's foo 100 timesNote that these are JIT numbers rather than fully precompiled numbers, so this is 100% real-world safe. Fully precompiled is just a bit faster, since there's no interpreted step or DefaultMethod wrapper to go through.
2.160000 0.000000 2.160000 ( 2.188087)
2.220000 0.010000 2.230000 ( 2.237414)
2.230000 0.010000 2.240000 ( 2.248185)
2.180000 0.010000 2.190000 ( 2.218540)
2.240000 0.010000 2.250000 ( 2.259535)
2.220000 0.010000 2.230000 ( 2.241170)
2.150000 0.010000 2.160000 ( 2.178414)
2.240000 0.010000 2.250000 ( 2.259772)
2.260000 0.000000 2.260000 ( 2.285141)
2.230000 0.010000 2.240000 ( 2.252396)
I have also made a lot of progress on adapting the compiler to create stack-based methods when possible. Basically, this involved inspecting the code for anything that would require access to local variables outside the body of the call. Things like eval, closures, etc. At the moment it works well and passes all tests, but I know methods similar to gsub which modify $~ or $_ are not working right. It's disabled at the moment, pending more work, but here's the method dispatch numbers with stack-based method compilation enabled:
Test interpreted: 100k loops calling self's foo 100 timesIt seems very promising work. I hope I'll be able to turn it on soon.
1.735000 0.000000 1.735000 ( 1.738000)
1.902000 0.000000 1.902000 ( 1.902000)
1.078000 0.000000 1.078000 ( 1.078000)
1.076000 0.000000 1.076000 ( 1.076000)
1.077000 0.000000 1.077000 ( 1.077000)
1.086000 0.000000 1.086000 ( 1.086000)
1.077000 0.000000 1.077000 ( 1.077000)
1.084000 0.000000 1.084000 ( 1.084000)
1.090000 0.000000 1.090000 ( 1.090000)
1.083000 0.000000 1.083000 ( 1.083000)
Oh, and for those who always need a fib fix, here's fib with both optimizations turned on:
~ $ jruby -J-server bench_fib_recursive.rbAnd MRI:
1.258000 0.000000 1.258000 ( 1.258000)
0.990000 0.000000 0.990000 ( 0.989000)
0.925000 0.000000 0.925000 ( 0.926000)
0.927000 0.000000 0.927000 ( 0.928000)
0.924000 0.000000 0.924000 ( 0.925000)
0.923000 0.000000 0.923000 ( 0.923000)
0.927000 0.000000 0.927000 ( 0.926000)
0.928000 0.000000 0.928000 ( 0.929000)
~ $ ruby bench_fib_recursive.rbThese numbers went down a bit because the call adapter is currently just generic code, and generic code that calls lots of different methods causes HotSpot to stumble a bit. The next step for the compiler is to generate custom call adapters for each call site that handle arity correctly (avoiding IRubyObject[] all the time) and call directly to the most-likely target methods.
1.760000 0.010000 1.770000 ( 1.775660)
1.760000 0.010000 1.770000 ( 1.776360)
1.760000 0.000000 1.760000 ( 1.778413)
1.760000 0.010000 1.770000 ( 1.776767)
1.760000 0.010000 1.770000 ( 1.777361)
1.760000 0.000000 1.760000 ( 1.782798)
1.770000 0.010000 1.780000 ( 1.794562)
1.760000 0.010000 1.770000 ( 1.777396)
Written on July 15, 2007