The Power of Java's NIO
Akira Tanaka just did a lightning talk on IO.copy_stream, a new method added to Ruby 1.9 last night that does a fast stream transfer all in C code, avoiding the intermediate strings. It's a good idea, one I hope takes hold for other IO operations that could use some native lovin.
It's so good, in fact, I took five minutes out of my day to produce a crude implementation of it in JRuby (and note, I have not looked at Akira's implementation, so I'm sure it handles cases my version does not):
This feature will only be available as part of JRuby's 1.9 support, which is still in development.
It's so good, in fact, I took five minutes out of my day to produce a crude implementation of it in JRuby (and note, I have not looked at Akira's implementation, so I'm sure it handles cases my version does not):
@JRubyMethod(name = "copy_stream", meta = true, compat = RUBY1_9)It's primitive, but it does the trick. NIO streams unfortunately only provide this nice "transferTo" method for FileChannel, though there's other ways you can do fast copies from stream to stream using buffers. But this was a nice simple way to wire it up for now. So, let's see numbers.
public static IRubyObject copy_stream(
IRubyObject recv, IRubyObject stream1, IRubyObject stream2)
throws IOException {
RubyIO io1 = (RubyIO)stream1;
RubyIO io2 = (RubyIO)stream2;
ChannelDescriptor d1 = io1.openFile.getMainStream().getDescriptor();
if (!d1.isSeekable()) {
throw recv.getRuntime().newTypeError("only supports file-to-file copy");
}
ChannelDescriptor d2 = io2.openFile.getMainStream().getDescriptor();
if (!d2.isSeekable()) {
throw recv.getRuntime().newTypeError("only supports file-to-file copy");
}
FileChannel f1 = (FileChannel)d1.getChannel();
FileChannel f2 = (FileChannel)d2.getChannel();
long size = f1.size();
f1.transferTo(f2.position(), size, f2);
return recv.getRuntime().newFixnum(size);
}
require 'benchmark'First JRuby numbers (excluding rehearsal):
Benchmark.bmbm do |bm|
bm.report("Control") do
10000.times do
File.open(__FILE__) do |file|
File.open(__FILE__ + ".tmp", 'w') do |file2|
end
end
end
end
bm.report("10k file copies") do
10000.times do
File.open(__FILE__) do |file|
File.open(__FILE__ + ".tmp", 'w') do |file2|
IO.copy_stream(file, file2)
end
end
end
end
end
user system total realNot bad for 10k copies of a file...it works out to about 3s above and beyond the cost of running the blocks and opening/closing the files. How about Ruby 1.9?
Control 2.191000 0.000000 2.191000 ( 2.191000)
10k file copies 5.190000 0.000000 5.190000 ( 5.190000)
user system total realRuby 1.9's numbers come out around 4.4s above and beyond the control logic. Obviously we need to look at improving JRuby's numbers for the blocks and file opening, but it's good to know that Java's IO actually *can* be nice and fast when you need it to be.
Control 0.360000 0.550000 0.910000 ( 0.903211)
10k file copies 0.850000 2.450000 3.300000 ( 5.363433)
This feature will only be available as part of JRuby's 1.9 support, which is still in development.
Written on March 30, 2008