Testing metrics and confidence
February 6, 2008 – 9:40 pmIn a recent Ask 37signals post over at Signal vs. Noise, DHH said the following:
We try to do a fairly good job at keeping our test suites current and exhaustive as well. Basecamp has a 1:1.2 ratio of test code (thanks to the persistence of Jamis!), Highrise has a ratio of 1:0.8 (bad me!). So you can change things in the applications and feel fairly comfortable that you at least haven’t killed the entire application if you make a mistake as the tests would catch that.
I’ve seen this sort of talk a couple of times since I started doing Rails work, and I’ve never really understood it. I mean, I understand it semantically - I just don’t understand how a simple ratio of lines of code to lines of test code can make anyone feel at all comfortable about how well-tested their application is. I think it has something to do with how easy it is to measure - you’re always just a rake stats from some reassuring numbers.
The problem is that you can always improve this ratio without improving the quality of your test suite. Lines of code is hardly correlated with quality, after all.
Of course, you can make the same argument about people who get the same sense of security from code coverage numbers (like those from rcov) - but at least in that case you’re guaranteed that your code is actually exercised to a greater or lesser extent.
The danger here is that your tests technically hit all the code in your app, but they don’t assert anything meaningful. The only benefit of this over the straight ratio metric is that you’ll catch some exceptions that you’d otherwise miss - beyond that, it’s just as useless as rake stats.
In reality, the reason DHH can be confident about his test suite isn’t because it has a high test ratio; it’s because he (and the other 37s developers) make it a practice to test correctly. Similarly, the reason I can feel good about the test suite on my latest project at Viget isn’t because it’s hovering around 99% coverage - it’s because I know that the tests I’m writing are good. I’m covering edge cases and validating functionality, not just writing irrelevant assertions. Metrics are fine, as long as you don’t mistake them for something actually meaningful.
* Heckle and Flog users are exempt from this discussion. If your Rails app is passing those, then yeah - your confidence is justified.

