Sunday, April 11, 2010

Nokogiri Java Port: Help Us Finish It!

One of the most commonly used native extensions for Ruby is the Nokogiri XML API. Nokogiri wraps libxml and has a fair amount of C code that links directly against the Ruby C extension API...an API we don't support in JRuby (and won't, without a lot of community help).

A bit over a year ago, the Nokogiri folks did us a big favor by creating an FFI version of Nokogiri that works surprisingly well; it's probably the most widely-used FFI-based Ruby library around. But the endgame for Nokogiri on JRuby has always been to get a pure-Java version. Not everyone is allowed to link native libraries on their Java platform of choice, and those that are often have trouble getting the right libxml versions installed. The Java version needs to happen.

That day is very close.

I spent a bit of time this weekend getting the Nokogiri "java" port running on my system, and the folks working on it have brought it almost to 100% passing. It's time to push it over the edge.

Building and Testing

Here's my process for getting it building. Let me know if this needs to be edited.

Update: Added rake-compiler and hoe to gems you need to install and modified the git command-line for versions that don't automatically create a local tracking branch.

1. Clone the Nokogiri repository and switch to the "java" branch
~/projects ➔ git clone git://github.com/tenderlove/nokogiri.git
Initialized empty Git repository in /Users/headius/projects/nokogiri/.git/
remote: Counting objects: 14767, done.
remote: Compressing objects: 100% (3882/3882), done.
remote: Total 14767 (delta 10482), reused 13969 (delta 9945)
Receiving objects: 100% (14767/14767), 3.73 MiB | 742 KiB/s, done.
Resolving deltas: 100% (10482/10482), done.

~/projects ➔ cd nokogiri/

~/projects/nokogiri ➔ git checkout -b java origin/java
Branch java set up to track remote branch java from origin.
Switched to a new branch 'java'

2. Install racc, rexical, rake-compiler, and hoe into Ruby (C Ruby, that is, since they also have extensions)
~/projects/nokogiri ➔ sudo gem install racc rexical rake-compiler hoe
Building native extensions. This could take a while...
Successfully installed racc-1.4.6
Successfully installed rexical-1.0.4
Successfully installed rake-compiler-0.7.0
Successfully installed hoe-2.6.0
4 gems installed

3. Build the lexer and parser using C Ruby
~/projects/nokogiri ➔ rake gem:dev:spec
(in /Users/headius/projects/nokogiri)
warning: couldn't activate the debugging plugin, skipping
rake-compiler must be configured first to enable cross-compilation
/usr/bin/racc -l -o lib/nokogiri/css/generated_parser.rb lib/nokogiri/css/parser.y
rex --independent -o lib/nokogiri/css/generated_tokenizer.rb lib/nokogiri/css/tokenizer.rex

4. Build the Java bits (using rake in JRuby)
~/projects/nokogiri ➔ jruby -S rake java:build
(in /Users/headius/projects/nokogiri)
warning: couldn't activate the debugging plugin, skipping
javac -g -cp /Users/headius/projects/jruby/lib/jruby.jar:../../lib/nekohtml.jar:../../lib/nekodtd.jar:../../lib/xercesImpl.jar:../../lib/isorelax.jar:../../lib/jing.jar nokogiri/*.java nokogiri/internals/*.java
jar cf ../../lib/nokogiri/nokogiri.jar nokogiri/*.class nokogiri/internals/*.class

5. Run the tests (again with rake on JRuby)
~/projects/nokogiri ➔ jruby -S rake test
(in /Users/headius/projects/nokogiri)
...full output...

On my system, I get about 8 failures and 19 errors, out of 785 tests and 1657 assertions. We're very close!

A few other useful tasks:
  • jruby -S rake java:clean_all wipes out the build Java stuff
  • jruby -S rake java:gem builds the Java gem, if you want to try installing it
Helping Out

If you'd like to help fix these bugs, there's a few ways to approach it.
  • Join the nokogiri-talk Google Group so you can communicate with others working on the port. The key folks right now are Yoko Harada and Sergio Arbeo (who did the original bulk of the work for GSoC 2009). I'm also poking at it a bit in my spare time.
  • Post to the group to let folks know you want to help. This will help avoid duplicated effort.
  • Pick tests that appear to be missing or incorrect Ruby logic, like "not implemented", nil results ("method blah not found for nil") or arity errors ("3 arguments for 2" kinds of things). These are often the simplest ones to fix.
  • Don't give up! We're almost there!
It would be great if we could have a 100% working Nokogiri Java port for JRuby 1.5 final this month. I hope to see you on the nokogiri-talk list! Feel free to comment here if you have questions about getting bootstrapped.

Saturday, April 3, 2010

Getting Started with Duby

Hello again!

As you may know, I've been working part-time on a new language called Duby. Duby looks like Ruby, since it co-opts the JRuby parser, and includes some of the features of the Ruby language like optional arguments and closures. But Duby is not Ruby; it's statically typed, compiles to "native" code (JVM bytecode, for example) before running, and does not have any built-in library of its own (preferring to just use what's available on a given runtime). Here's a quick sample of Duby code:
class Foo
def initialize(hello:String)
puts 'constructor'
@hello = hello
end

def hello(name:String)
puts "#{@hello}, #{name}"
end
end

Foo.new('Hiya').hello('Duby')

This post is not going to be an overview of the Duby language; I'll get that together soon, once I take stock of where the language stands as far as features go. Instead, this "getting started" post will show how you can grab the Duby repository and start playing with it right now.

First you need to pull down three resources: Duby itself, BiteScript (the Ruby DSL I use to generate JVM bytecode), and a JRuby 1.5 snapshot:
~/projects/tmp ➔ git clone git://github.com/headius/duby.git
Initialized empty Git repository in /Users/headius/projects/tmp/duby/.git/
remote: Counting objects: 2810, done.
remote: Compressing objects: 100% (1291/1291), done.
remote: Total 2810 (delta 1690), reused 2509 (delta 1447)
Receiving objects: 100% (2810/2810), 10.64 MiB | 722 KiB/s, done.
Resolving deltas: 100% (1690/1690), done.

~/projects/tmp ➔ git clone git://github.com/headius/bitescript.git
Initialized empty Git repository in /Users/headius/projects/tmp/bitescript/.git/
remote: Counting objects: 470, done.
remote: Compressing objects: 100% (404/404), done.
remote: Total 470 (delta 166), reused 313 (delta 57)
Receiving objects: 100% (470/470), 93.56 KiB, done.
Resolving deltas: 100% (166/166), done.

~/projects/tmp ➔ curl http://ci.jruby.org/snapshots/jruby-bin-1.5.0.dev.tar.gz | tar xz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 11.3M 100 11.3M 0 0 353k 0 0:00:32 0:00:32 --:--:-- 262k

~/projects/tmp ➔ ls
bitescript duby jruby-1.5.0.dev

~/projects/tmp ➔ mv jruby-1.5.0.dev/ jruby

Once you have these three pieces in place, Duby can now be run. It's easiest to put the JRuby snapshot in PATH, but you can just run it directly too:
~/projects/tmp ➔ cd duby

~/projects/tmp/duby ➔ ../jruby/bin/jruby bin/duby -e "puts 'hello'"
hello

~/projects/tmp/duby ➔ ../jruby/bin/jruby bin/dubyc -e "puts 'hello'"

~/projects/tmp/duby ➔ java DashE
hello

Finally, you may want to create a "complete" Duby jar that includes Duby, BiteScript, JRuby, and Java classes for command-line or Ant task usage. Using JRuby 1.5's Ant integration, the Duby Rakefile can produce that for you:
~/projects/tmp/duby ➔ ../jruby/bin/jruby -S rake jar:complete
(in /Users/headius/projects/tmp/duby)
mkdir -p build
Compiling Ruby sources
Generating Java class DubyCommand to DubyCommand.java
javac -d build -cp ../jruby/lib/jruby.jar:. DubyCommand.java
Compiling Duby sources
mkdir -p dist
Building jar: /Users/headius/projects/tmp/duby/dist/duby.jar
mkdir -p dist
Building jar: /Users/headius/projects/tmp/duby/dist/duby-complete.jar

~/projects/tmp/duby ➔ java -jar dist/duby-complete.jar run -e 'puts "Duby is Awesome!"'
Duby is Awesome!


Hopefully we'll soon have duby.jar, duby-complete.jar, and a new Duby gem released, but this is a quick way to get involved.

I'll get back to you with a post on the Duby language itself Real Soon Now!

Update: I have also uploaded a snapshot duby-complete.jar (which includes both the Main-Class for jar execution and the simple Ant task org.jruby.duby.ant.Compiler) on the Duby Github downloads page. Have fun!

Using Ivy with JRuby 1.5's Ant Integration

JRuby 1.5 will be released soon, and one of the coolest new features is the integration of Ant support into Rake, the Ruby build tool. Tom Enebo wrote an article on the Rake/Ant integration for the Engine Yard blog, which has lots of examples of how to start migrating to Rake without leaving Ant behind. I'm not going to cover all that here.

I've been using the Rake/Ant stuff for a few weeks now, first for my "weakling" RubyGem which adds a queue-supporting WeakRef to JRuby, and now for cleaning up Duby's build process. Along the way, I've realized I really never want to write Ant scripts again; they're so much nicer in Rake, and I have all of Ruby and Ant available to me.

One thing Ant still needs help with is dependency resolution. Many people make the leap to Maven, and let it handle all the nuts and bolts. But that only works if you really buy into the Maven way of life...a problem if you're like me and you live in a lot of hybrid worlds where the Maven way doesn't necessarily fit. So many folks are turning to Apache Ivy to get dependency management in their builds without using Maven.

Today I thought I'd translate the simple "no-install" Ivy example build (warning, XML) to Rake, to see how easy it would be. The results are pretty slick.

First we need to construct the equivalent to the "download-ivy" and "install-ivy" ant tasks. I chose to put that in a Rake namespace, like this:
namespace :ivy do
ivy_install_version = '2.0.0-beta1'
ivy_jar_dir = './ivy'
ivy_jar_file = "#{ivy_jar_dir}/ivy.jar"

task :download do
mkdir_p ivy_jar_dir
ant.get :src => "http://repo1.maven.org/maven2/org/apache/ivy/ivy/#{ivy_install_version}/ivy-#{ivy_install_version}.jar",
:dest => ivy_jar_file,
:usetimestamp => true
end

task :install => :download do
ant.path :id => 'ivy.lib.path' do
fileset :dir => ivy_jar_dir, :includes => '*.jar'
end

ant.taskdef :resource => "org/apache/ivy/ant/antlib.xml",
#:uri => "antlib:org.apache.ivy.ant",
:classpathref => "ivy.lib.path"
end
end

Notice that instead of using Ant properties, I've just used Ruby variables for the Ivy install version, dir, and file. I've also removed the "uri" element to ant.taskdef because I'm not sure if we have an equivalent for that in Rake yet (note to self: figure out if we have an equivalent for that).

With these two tasks, we can now fetch ivy and install it for the remainder of the build. Here's running the download task from the command line:
~/projects/duby ➔ rake ivy:download
(in /Users/headius/projects/duby)
mkdir -p ./ivy
Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.0.0-beta1/ivy-2.0.0-beta1.jar
To: /Users/headius/projects/duby/ivy/ivy.jar

Now we want a simple task that uses ivy:install to fetch resources and make them available for the build. Here's the example from Apache, using the cachepath task, written in Rake:
task :go => "ivy:install" do
ant.cachepath :organisation => "commons-lang",
:module => "commons-lang",
:revision => "2.1",
:pathid => "lib.path.id",
:inline => "true"
end

Pretty clean and simple, and it fits nicely into the flow of the Rakefile. I can also switch this to using the "retrieve" task, which just pulls the jars down and puts them where I want them:
task :go => "ivy:install" do
ant.retrieve :organisation => 'commons-lang',
:module => 'commons-lang',
:revision => '2.1',
:pattern => 'javalib/[conf]/[artifact].[ext]',
:inline => true
end

This fetches the Apache Commons Lang package along with all dependencies into javalib, separated by what build configuration they are associated with (runtime, test, etc). Here it is in action:
~/projects/duby ➔ rake go
(in /Users/headius/projects/duby)
mkdir -p ./ivy
Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.0.0-beta1/ivy-2.0.0-beta1.jar
To: /Users/headius/projects/duby/ivy/ivy.jar
Not modified - so not downloaded
Trying to override old definition of task buildnumber
:: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
:: loading settings :: url = jar:file:/Users/headius/.ant/lib/ivy.jar!/org/apache/ivy/core/settings/ivysettings.xml
:: resolving dependencies :: commons-lang#commons-lang-caller;working
confs: [default, master, compile, provided, runtime, system, sources, javadoc, optional]
found commons-lang#commons-lang;2.1 in public
:: resolution report :: resolve 64ms :: artifacts dl 3ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 1 | 0 |
| master | 1 | 0 | 0 | 0 || 1 | 0 |
| compile | 1 | 0 | 0 | 0 || 0 | 0 |
| provided | 1 | 0 | 0 | 0 || 0 | 0 |
| runtime | 1 | 0 | 0 | 0 || 0 | 0 |
| system | 1 | 0 | 0 | 0 || 0 | 0 |
| sources | 1 | 0 | 0 | 0 || 1 | 0 |
| javadoc | 1 | 0 | 0 | 0 || 1 | 0 |
| optional | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: retrieving :: commons-lang#commons-lang-caller
confs: [default, master, compile, provided, runtime, system, sources, javadoc, optional]
4 artifacts copied, 0 already retrieved (1180kB/30ms)

But if I have multiple artifacts, this could be pretty cumbersome. Since this is Ruby, I can just put this in a method and call it repeatedly:
def ivy_retrieve(org, mod, rev)
ant.retrieve :organisation => org,
:module => mod,
:revision => rev,
:pattern => 'javalib/[conf]/[artifact].[ext]',
:inline => true
end

artifacts = %w[
commons-lang commons-lang 2.1
org.jruby jruby 1.4.0
]

task :go => "ivy:install" do
artifacts.each_slice(3) do |*artifact|
ivy_retrieve(*artifact)
end
end

Look for JRuby 1.5 release candidates soon, and let us know what you think of the new Ant integration!

Sunday, March 28, 2010

Ruby Summer of Code 2010

This year, no major Ruby organization got accepted to Google's Summer of Code (even though a half dozen Python projects got accepted, but I won't rant here). What do we as Rubyists do? Take it sitting down? NO! We make our own Summer of Code!

Thanks to Engine Yard, Ruby Central, and the Rails team, Ruby Summer of Code has raised $100k in just three days, allowing us to run 20 student projects! Hooray!

Ruby Summer of Code

Now of course we really would love to have some JRuby projects involved. There's so much exciting stuff going on with JRuby, and I believe it's the most promising platform for really growing the Ruby community. So we've set up a page for JRuby Ruby Summer of Code 2010 ideas. Here's a few to get you started:
  • JRuby on Android work, including command-line tooling, performance work, and all the little bits and pieces needed to make Ruby a first-class Android language.
  • Porting key C extensions to JRuby, so there's an alternative for people migrating.
  • A super-fast lightweight server similar to the GlassFish gem.
  • A full Hibernate and/or JPA backend for DataMapper or DataObjects, so that all databases Hibernate supports "just work" with JRuby.
  • Work on JRuby's nascent suport for Ruby C extensions by building the API out
  • Help get JRuby's early optimizing compiler wired up, to take JRuby's perf to the next level
  • Duby-related projects, like IDE support, better tooling, codebase cleanup, features, documentation.
And there's dozens of other projects out there just waiting for you! Add yourself as a student on the RubySOC page, add some ideas to the JRuby ideas page, and let's get hacking!