Performance: Inlining Strings
A while ago, we decided to inline all appropriate symbolic strings as they entered the AST. This appeared to help performance a measurable amount, presumably for two reasons:
Without interning AST strings:
real 1m19.894s
user 1m18.649s
sys 0m0.920s
With interning AST strings:
real 1m15.021s
user 1m13.785s
sys 0m0.948s
So it's very measurable...in this case 4-5 seconds out of 80 seconds, or about a 6% gain with interning.
However this week I realized the fallacy of the second point above. In order to intern each incoming string, Java must hash them. This causes all strings entering the AST to be hashed once anyway in order to get the interned version. In fact, it could reduce performance, since we're now forced to hash strings we might never encounter during execution.
I can't confirm that this is 100% true, but it's a reasonable theory. String.intern() is native code, but logic dictates that the fastest way to intern strings would be to use an internal hash/symbol table. So I proceeded this evening to try removing interning to see how it affected performance. I hoped to get a small gain during execution due to a large gain during parsing. The numbers above show that I lose a measurable amount overall by interning. However, I do see substantial parse performance gains:
With interning AST strings (Time.now - t method of measuring 100 parses):
11.101
Without interning AST strings:
9.404
About 1.7 seconds out of 11, or a whopping 15% gain in parse speed without interning. AARGH.
At this point I'm leaning toward removing interning and swallowing the 6% performance hit temporarily until we can figure out where it's coming from. Logically, interning everything should only add overhead, or so my brain tells me. Where's the flaw in my logic?
- The AST would take up less space in the long term
- Since Strings cache their hashcodes, having each identical string in the AST be the same object would reduce the number of hash calculations.
Without interning AST strings:
real 1m19.894s
user 1m18.649s
sys 0m0.920s
With interning AST strings:
real 1m15.021s
user 1m13.785s
sys 0m0.948s
So it's very measurable...in this case 4-5 seconds out of 80 seconds, or about a 6% gain with interning.
However this week I realized the fallacy of the second point above. In order to intern each incoming string, Java must hash them. This causes all strings entering the AST to be hashed once anyway in order to get the interned version. In fact, it could reduce performance, since we're now forced to hash strings we might never encounter during execution.
I can't confirm that this is 100% true, but it's a reasonable theory. String.intern() is native code, but logic dictates that the fastest way to intern strings would be to use an internal hash/symbol table. So I proceeded this evening to try removing interning to see how it affected performance. I hoped to get a small gain during execution due to a large gain during parsing. The numbers above show that I lose a measurable amount overall by interning. However, I do see substantial parse performance gains:
With interning AST strings (Time.now - t method of measuring 100 parses):
11.101
Without interning AST strings:
9.404
About 1.7 seconds out of 11, or a whopping 15% gain in parse speed without interning. AARGH.
At this point I'm leaning toward removing interning and swallowing the 6% performance hit temporarily until we can figure out where it's coming from. Logically, interning everything should only add overhead, or so my brain tells me. Where's the flaw in my logic?
Written on September 1, 2006