This was originally posted as a comment on @fogus's blog post "Why Clojure doesn’t need invokedynamic, but it might be nice". I figured it's worth a top-level post here.
Ok, there's some good points here and a few misguided/misinformed positions. I'll try to cover everything.
First, I need to point out a key detail of invokedynamic that may have escaped notice: any case where you must bounce through a generic piece of code to do dispatch -- regardless of how fast that bounce may be -- prevents a whole slew of optimizations from happening. This might affect Java dispatch, if there's any argument-twiddling logic shared between call sites. It would definitely affect multimethods, which are using a hand-implemented PIC. Any case where there's intervening code between the call site and the target would benefit from invokedynamic, since invokedynamic could be used to plumb that logic and let it inline straight through. This is, indeed, the primary benefit of using invokedynamic: arbitrarily complex dispatch logic folds away allowing the dispatch to optimize as if it were direct.
Your point about inference in Java dispatch is a fair one...if Clojure is able to infer all cases, then there's no need to use invokedynamic at all. But unless Clojure is able to infer all cases, then you've got this little performance time bomb just waiting to happen. Tweak some code path and obscure the inference, and kablam, you're back on a slow reflective impl. Invokedynamic would provide a measure of consistency; the only unforeseen perf impact would be when the dispatch turns out to *actually* be polymorphic, in which case even a direct call wouldn't do much better.
For multimethods, the benefit should be clear: the MM selection logic would be mostly implemented using method handles and "leaf" logic, allowing hotspot to inline it everywhere it is used. That means for small-morphic MM call sites, all targets could potentially inline too. That's impossible without invokedynamic unless you generate every MM path immediately around the eventual call.
Now, on to defs and Var lookup. Depending on the cost of Var lookup, using a SwitchPoint-based invalidation plus invokedynamic could be a big win. In Java 7u2, SwitchPoint-based invalidation is essentially free until invalidated, and as you point out that's a rare case. There would essentially be *no* cost in indirecting through a var until that var changes...and then it would settle back into no cost until it changes again. Frequently-changing vars could gracefully degrade to a PIC.
It's also dangerous to understate the impact code size has on JVM optimization. The usual recommendation on the JVM is to move code into many small methods, possibly using call-through logic as in multimethods to reuse the same logic in many places. As I've mentioned, that defeats many optimizations, so the next approach is often to hand-inline logic everywhere it's used, to let the JVM have a more optimizable view of the system. But now we're stepping on our own feet...by adding more bytecode, we're almost certainly impacting the JVM's optimization and inlining budgets.
OpenJDK (and probably the other VMs too) has various limits on how far it will go to optimize code. A large number of these limits are based on the bytecoded size of the target methods. Methods that get too big won't inline, and sometimes won't compile. Methods that inline a lot of code might not get inlined into other methods. Methods that inline one path and eat up too much budget might push out more important calls later on. The only way around this is to reduce bytecode size, which is where invokedynamic comes in.
As of OpenJDK 7u2, MethodHandle logic is not included when calculating inlining budgets. In other words, if you push all the Java dispatch logic or multimethod dispatch logic or var lookup into mostly MethodHandles, you're getting that logic *for free*. That has had a tremendous impact on JRuby performance; I had previous versions of our compiler that did indeed infer static target methods from the interpreter, but they were often *slower* than call site caching solely because the code was considerably larger. With invokedynamic, a call is a call is a call, and the intervening plumbing is not counted against you.
Now, what about negative impacts to Clojure itself...
#0 is a red herring. JRuby supports Java 5, 6, and 7 with only a few hundred lines of changes in the compiler. Basically, the compiler has abstract interfaces for doing things like constant lookup, literal loading, and dispatch that we simply reimplement to use invokedynamic (extending the old non-indy logic for non-indified paths). In order to compile our uses of invokedynamic, we use Rémi Forax's JSR-292 backport, which includes a "mock" jar with all the invokedynamic APIs stubbed out. In our release, we just leave that library out, reflectively load the invokedynamic-based compiler impls, and we're off to the races.
#1 would be fair if the Oracle Java 7u2 early-access drops did not already include the optimizations that gave JRuby those awesome numbers. The biggest of those optimizations was making SwitchPoint free, but also important are the inlining discounting and MutableCallSite improvements. The perf you see for JRuby there can apply to any indirected behavior in Clojure, with the same perf benefits as of 7u2.
For #2, to address the apparent vagueness in my blog post...the big perf gain was largely from using SwitchPoint to invalidate constants rather than pinging a global serial number. Again, indirection folds away if you can shove it into MethodHandles. And it's pretty easy to do it.
As for #4...well, all good things take a little effort :) I think the effort required is far lower than you suspect, though.