Wednesday, February 22, 2006

Migrating from Ant to Rake?

I will admit it. I'm fed up with Ant.

My current charge has me as architect for a number of larger applications and the platform on which they run--perhaps over 1Mloc total from DB to front-end. These applications must be deployable across several environments (dev, integ, qa, ua, prod, prod backup). Each application pushes its own configurations out to the app and web servers, restarting them as appropriate. In addition, our reports are automated via a similar process. All that's required to take a bare web cluster, app cluster, and report cluster from nothing to fully functional is a single build command. The entire process, from beans to nuts, is automated through Ant.

Some particularly nice nuggets:

- Apache configurations, based on a generic template, are generated with environment-specific settings at build time. If you are building to the QA environment, QA servers, urls, and filesystem paths are inserted into the template. There is also a template for installing an app-wide or site-wide outage page.
- WebLogic configurations are generated the same way, with a base template filled out with environment-specific details.
- In both WebLogic and Apache cases, the generated configs are pushed out to the servers as part of the build. In this way, the actual app configuration is versioned along with the apps themselves, and rolling back to a previous release automatically downgrades server configurations.
- Via tools like net.exe and sc.exe in Windows, the build logs on to remote shares and manipulates remote services. This is especially important for WebLogic, where many system classpath and domain-wide configuration settings require a restart of the administration server.
- In order to support "throwing it over the wall" to our client, the build command has been made extremely simple:
ant -Denvironment=qa clean build deploy.all...will completely build and deploy any one of the applications (or the platform itself) to the QA environment.

And so on. All told, it's a beautifully automated, extensively documented build process that reduces even the most complicated build tasks to a single target. It's the epitomy of what an Ant script should be able to do.

It's also over 5000 lines of grotty XML.

Or at least I should say, it's over 2000 lines for one of the applications, while the platform's script is just over 1000 lines, and another smaller application is in the 1100 range. All told, there's at least 5000 lines of build script to maintain, though there's obviously some duplication and a lot of mirroring across those scripts. And unfortunately, my case is far from unique.

Over the past year, I've made multiple efforts to find a way to rewrite or simply refactor these scripts, ranging from genericizing tasks that appear again and again across builds to breaking up larger scripts into smaller ones for specific subsystems. In every case, the end result is no simpler and no easier to maintain than the original; many genericizing attempts actually resulted in more code rather than less, since clients of that genericized code must pass more state and more configuration along. It seems that for these projects and these applications, what we have is as good as it gets.

At least, as good as it gets with Ant.

Ant itself suffers from a number of flaws that I don't need to discuss here. I will, however, call out a few specifics:

- The declarative vs procedural debate rages endlessly; however being a programmer I think procedurally, and most build tasks I wish to automate are procedures rather than simple relationships between disconnected tasks. An enormous amount of overhead is spent in Ant scripts bridging this gap between declarative and procedural worlds, and some of those hacks are seriously ugly.
- Ant does not provide good support for creating "template" build targets, where various elements and tasks within that target are configurable at runtime. If, for example, I have the same rough target for installing an Apache outage-page configuration, I should be able to create a generic version of that target that takes in app-specific tasks and parameters and modifies its behavior accordingly. Ant's minimal support for "params" and "properties" in ant and antcall targets works fine for passing along configuration (aside from the requisite XML overhead from entering <param name="someName" value="someValue"> for every single parameter), but it does little to actually change the behavior of the target itself. You can't change the tasks called and you can't provide alternative targets to execute.
- Ant is extremely poor at sharing across build files. In one of the refactoring efforts, we broke out build targets by subsystem, with EJB stuff in one file and Web stuff in another. Unfortunately, those targets had dependencies on various of the same configurations, tasks, and targets between them. making n build scripts work together as a cohesive whole was exponentially more difficult than making a single build script work well.
- Ant, being XML-based, is grossly verbose. At least 75% of those 5000 lines is due to XML bloat.

I'm sure readers will have any number of alternative solutions to issues I list some cases, you may be able to solve many of Ant's deficiencies. However I would wager a guess that no amount of hacking or refactoring will be able to address all the issues with Ant, and I think there's a body of work that agrees with me.

So what is the alternative?

Maven provides some enhancement to the build process, specifically in the area of managing dependencies, subsystems, and cross-project builds. It also provides a more procedural language, Jelly, which can call and be called from Ant scripts (though Jelly is still XML-based and suffers from the same verbosity). Maven, like other solutions, might provide relief for a few of Ant's failures, but it introduces many more of its own. These applications are also not highly componentized; the size and complexity is almost entirely from application-specific business and presentation logic. While the applications themselves could (and perhaps should) be better componentized, Maven is not a useful or realistic option in the near future, and I'm dubious as to whether it would reduce or increase overall complexity. The old standby Make is of course another option, but there are reasons people use Ant instead.

Here's what I need:
- A procedural build process that understands declarative dependencies
- An elegant and simple language that I can easily write and others can easily read
- Tight integration with Java and awareness of how Java builds must proceed
- Reuse of existing build utilities, including existing Ant tasks and javac support in the JVM

I believe that Ruby's build tool "Rake" is the answer I've been looking for.

Rake was created for many of the same reasons Ant was created, primarily because Make's many faults and deficiencies--however minor--could no longer be overlooked. Rake provides a very procedural way to run builds, but also has awareness of dependencies and task ordering. Most importantly, Rakefiles are simply Ruby code, and so any thing you can do in a Ruby script you can do in a Rakefile. The ability to actually use an "if" statement or a loop can't be overstated here; anyone who's tried to do the same operations in Ant fully realizes how difficult it can be. So out of the box, Rake easily fulfills the first two requirements above. I believe it's time to help Rake realize the second two requirements, and JRuby will make that happen.

JRuby, as you may all know, is an implementation of Ruby that runs on the JVM. Originally written to match Ruby 1.6, it has recently come again under heavy development to finally achieve 1.8-compatibility. In addition, we have started to run the "big ticket" Ruby applications like Rails in an effort to flush out remaining interpreter incompatibilities. Rake is one of those applications.

Currently, there is still work to be done to get Rake working with JRuby. However, while Rake is really an outstanding work of simplicity and a great example of Ruby's power, it does not in my estimation do anything crazy with Ruby...or at least, nothing crazy that JRuby can't support in the near term. Among many other JRuby-related projects, I intend to and believe I can successfully get Rake working.

So if we set aside the current compatibility concerns, we can start to see the potential of Rake+JRuby for building Java applications. First and foremost, Rake running within the JVM would have access to all the same libraries and tools that Ant uses. Calling out to javac, hitting databases with JDBC, running XDoclet or EJBGEN or Annotation-based tools--all will be simple to do from within a Rake+JRuby Rakefile. Second, and perhaps more compelling, there's no reason why a Rakefile couldn't use existing Ant tasks and tools directly. A Rakefile could either transparently wrap existing Ant builds or could directly call Ant tasks (providing an appropriate execution context, of course). In a perfect world, every target in my existing Ant-based build process would be 100% supported in my future Rakefile, with a minimum of wrapping fuss.

So what remains to be done for Rake to replace Ant in my world? Surely the JRuby issues are the first things to resolve; this is of course why JRuby is number one on my spare-time project list. Again putting that aside, there are three areas that would need some additional work for Rake:

1. Rake must allow seamless, flawless integration and intelligence of Java's idiosyncracies, from classpath/classloader management to compilation quirks. As Ant is able to do, Rake must at a minimum handle the basic Java build operations seamlessly.
2. Existing tools, Ant tasks, and frameworks frequently used in builds must either be wrapped as appropriate or there must be a simple, elegant way to make those tools, tasks, and frameworks accessible from within Rakefiles. I do not believe there should be any concerted effort to reimplement or wrap existing code, if there is a way to make that code accessible without excessive overhead.
3. Finally, Rake must represent a reasonable migration path for existing Ant-based builds from a configuration management perspective. Specifically, as much as possible current Ant use cases should have identical or very similar analogs in the Rake world. Rake and Rakefiles currently look and feel (from the outside) very similar to Make and Makefiles for this exact reason. Similar care must be taken on the Ant side.

I believe this is all possible, and very likely possible in the near future. JRuby has been improving by leaps and bounds over the past year, and the market is ripe for an alternative to Ant. Even more than Rails, the ability to build Java applications using Rake is very exciting to me...if only as a way to escape my 5000-line build script hell.

Now if only I had another 8 hours every day to spend exclusively working on this stuff.

Tuesday, February 21, 2006

Making Progress on Rails

I've been making good progress on my end of the Rails-on-JRuby work. I have been focusing on getting the "generate" script working, and as a result fixing multiple bugs and minor issues as they come up. Here's an update.

The most recent issue, now apparently mostly fixed, involved evaluating a Ruby script from within a block. The JRuby parser was not wired to understand parsing from within scopes other than the top-level Object, and so it declared and defined certain variables incorrectly. I modified the parser to allow specifying that it will run from within a block, and the issues have been remedied.

With this fix, here's a rough description of the progress of the generate script:
  • Execution proceeds into the initializer (vendor/rails/railties/lib/initializer.rb). In the scenario I'm executing (generate script with no parameters) the process method is eventually invoked.
  • The load path is set (Initializer#set_load_path) without any issues
  • Connection adapters are set (Initializer#set_connection_adapters)
  • All frameworks are required in (Initializer#require_frameworks). This was where most of the failures and fixes came into play, but it now executes successfully.
  • The current environment is loaded (Initializer#load_environment). I'm using the default environment right now.
  • Initialization of the database begins to run, but appears to get stuck in some infinite loop during YAML parsing of the default database.yml file. I have not investigated this issue yet, and disabled database initialization to continue some testing.
  • The logger initializes without any issues (Initializer#initialize_logger)
  • The framework logging and views are initialized (Initializer#initializer_framework_logger and initialize_framework_view)

At this point, the next step is initializing routing (presumably request routing; I'm not any sort of Rails expert yet). This fails with what appears to be some scoping issues, and I have not gotten further at the moment.

Tom Enebo, the other main JRuby developer, is also making progress on actually running Rails in the simplest of deployment scenarios. He's currently beefing up our Socket implementation so that WeBRICK will run correctly. The current sticking point is our less-than-great support for Ruby's IO; namely, we do not support select correctly (or perhaps at all).

I think we've made great progress on both fronts, despite ongoing issues. It's very heartening that for the generate script all the libraries are required successfully and several subsystems initialize without any problems. There's still quite a bit of work to be done, but we're definitely getting there. I'll post more updates as they come in!