Performance: Block Variables Breakdown
In response to my previous blog, Chris Nokleberg noted that if we're using String#equals a lot, interning will have an additional benefit...namely because String#equals short-circuits if both String objects are ==. I had forgotten about that benefit, so I thought I'd poke around for places it might have an effect.
In digging a little, I was reminded that DynamicVariableSet, which holds block-local variables, does a linear search for retrievals and mutations. Compare that to method-local variables in Scope, for which all clients must pass an index to get or set a value. I'm not sure if there's a reason why DynVars must be retrieved by name and Scope vars can be directly indexed...it's probably historical from the C code or a limitation of the current parser. In any case, a linear search could theoretically represent a performance issue accessing or modifying block-local variables. So I thought I'd run some numbers to see how frequently we have to search past the 0 value in the DynVarSet. Here's the breakdown while gem installing rake:
Count of block var retrieves from indicies 0 through 6:
516436
226049
42687
1170
656
122
1
So a total of 787121 block variable lookups, with percentages below:
index 0: 65.6%
index 1: 28.7%
index 2: 5.4%
...with the remainder filling out the last 0.3% of accesses
What can we learn from this? Well DynVarSet currently allocates initial space for up to 8 variables, based on someone's long-past examination of code that showed a maximum of 6 was a good estimate. The numbers above show that, for RubyGems + RDoc at least, a maximum of 3 would cover 99.7% of all cases without requiring expansion of the array, and that with a subsequent expansion to 6 slots we would cover all but 0.0000127% of cases. So using 6 as a maximum and always allocating 8 slots is perhaps a bit generous. More profiling on more code is warranted here, but it might be safe to drop the initial size to 3 and leave in the doubling when expanding.
It also shows us that we pay the cost of two String#equals calls for 28.7% of lookups, and pay for three in 5.4% of lookups, with higher counts statistically insignificant. So although it might seem like we're unlikely to suffer much of a performance hit because most block var lookups are at index 0, our cost to lookup or modify block variables is doubled in over a quarter of cases and tripled in a twentieth of cases. If we normalize all lookups to the same cost (X * 99.7 versus X * 139.2), that translates to a 28% performance boost for block variable lookup cost. Note that I'm not measuring block variable modifications here either, which would conceivably see a similar boost.
And of course what it doesn't show (but which we know to be true) is that if we were able to eliminate both the linear search and the call to String#equals, we'd save a very large percentage of block var lookup time, since we'd reduce it to a simple array indexing. I think it's worth exploring how we could do that--especially since we already do it for Scope in all the same places.
Isn't optimizing legacy code fun?
In digging a little, I was reminded that DynamicVariableSet, which holds block-local variables, does a linear search for retrievals and mutations. Compare that to method-local variables in Scope, for which all clients must pass an index to get or set a value. I'm not sure if there's a reason why DynVars must be retrieved by name and Scope vars can be directly indexed...it's probably historical from the C code or a limitation of the current parser. In any case, a linear search could theoretically represent a performance issue accessing or modifying block-local variables. So I thought I'd run some numbers to see how frequently we have to search past the 0 value in the DynVarSet. Here's the breakdown while gem installing rake:
Count of block var retrieves from indicies 0 through 6:
516436
226049
42687
1170
656
122
1
So a total of 787121 block variable lookups, with percentages below:
index 0: 65.6%
index 1: 28.7%
index 2: 5.4%
...with the remainder filling out the last 0.3% of accesses
What can we learn from this? Well DynVarSet currently allocates initial space for up to 8 variables, based on someone's long-past examination of code that showed a maximum of 6 was a good estimate. The numbers above show that, for RubyGems + RDoc at least, a maximum of 3 would cover 99.7% of all cases without requiring expansion of the array, and that with a subsequent expansion to 6 slots we would cover all but 0.0000127% of cases. So using 6 as a maximum and always allocating 8 slots is perhaps a bit generous. More profiling on more code is warranted here, but it might be safe to drop the initial size to 3 and leave in the doubling when expanding.
It also shows us that we pay the cost of two String#equals calls for 28.7% of lookups, and pay for three in 5.4% of lookups, with higher counts statistically insignificant. So although it might seem like we're unlikely to suffer much of a performance hit because most block var lookups are at index 0, our cost to lookup or modify block variables is doubled in over a quarter of cases and tripled in a twentieth of cases. If we normalize all lookups to the same cost (X * 99.7 versus X * 139.2), that translates to a 28% performance boost for block variable lookup cost. Note that I'm not measuring block variable modifications here either, which would conceivably see a similar boost.
And of course what it doesn't show (but which we know to be true) is that if we were able to eliminate both the linear search and the call to String#equals, we'd save a very large percentage of block var lookup time, since we'd reduce it to a simple array indexing. I think it's worth exploring how we could do that--especially since we already do it for Scope in all the same places.
Isn't optimizing legacy code fun?
Written on September 1, 2006