When is a Routine Too Big?

by bob on March 14, 2011

I’m thinking — and not for the first time — of slimming down a monster method that has grown beyond the size any “well-written” routine is “supposed” to be.  Depending on who you’re listening to, no routine should exceed a hundred or so lines of code, or two or three screen’s worth.  And almost universally, any routine I write follows this rule — a rule that is, in general, a common-sense one.

On the other hand, this method does not violate the single-purpose rule: it has one job to do: to take a business name and correct mis-spellings, misbegotten (mis)abbreviations, and the like.  Or more generically, its purpose is to apply a series of normalizing transformations according to a set of rules, in a certain order, to transform a string.

And it’s not as if this method doesn’t break down its task.  It calls many other methods, both general-purpose utility libraries and private methods in its own class that exist mainly to simplify human understanding of what is going on and improve maintainability.  Still, it is thousands of lines long and growing all the time.

The problem is in the ruleset.  Not every problem lends itself to a nice, orderly application of transformations, one at a time.  In an ideal world, this method would be a great candidate to have large swaths of its work be table-driven.  For example, correcting misspellings involves calls of the general form:

SubstituteWords(businessName,misSpelledWord,correctlySpelledWord)

One can easily envision a table of misspelled words with their corresponding correct spelling.  I could then add words to that table over time, sparing me the constant growth of this method and allowing me to add to the ruleset without recompiling.

But wait, it’s not that simple.  Some misspellings are unsafe to do across the board and should ignore the first or last word.  Others make sense only for the first or last word.

As an example of the need for some transformations to use caution, CSTG is a common abbreviation for CASTING but it could be a legitimate abbreviation or DBA for a company name.  So I don’t want to expand CSTG Manufacturing into Casting Manufacturing when it could well be an acronym for something like Coastal Systems Technology Group Manufacturing.  So I do not apply the transformation of CSTG into Casting if CSTG happens to be the first word.  If CSTG also happened to mean something in a foreign language I would not apply the rule at all in countries where that language is spoken.

So now my spelling table has to sprout a code field that will indicate whether the word substitution can happen anywhere, anywhere except the first word, anywhere except the last word, only on the first word, or only on the last word, and I would have to somehow represent the country filtering requirement.  To do the latter in a “relationally correct” way it would probably take the form of yet another table that associated a given rule ID with the IDs of any forbidden countries.  Now I need to write and maintain an administrative application with a UI in order for even experienced operators to manipulate these rules safely and without error.  And I need a whole series of unit tests to make sure I correctly interpret how all these rules are captured in the database.

If I do all this I will have turned a straightforward list of unconditional calls into a method that has to dither around on every call deciding exactly how to apply the rule at hand.  Between that and the database overhead, I’m starting to get nervous about performance.

Aside from all that, I’ve now Balkanized things so that instead of encapsulating everything I’m doing in a single method, some of it is out in a database table that has to be queried in a certain way in order to clearly see what rules are being applied in what order.

Oh, and that’s another problem.  Some corrections need to happen in a certain order and in fact, groups of corrections need to be interspersed with other, different operations that are order-dependent.  So now I’m faced with needing to assign sequences and groupings and the table is becoming even less straightforward to maintain.  I can’t simply throw new words into the table; I have to code each one so it’s applied at the right time.

You can see why I keep returning to the simplicity and dependable speed of making a sequence of calls where you can plainly see they are happening in a certain place in the code.  It’s a question of when and how to capture the business intelligence I’m trying to apply.  It’s a question of what is the most self-documenting and the most performant and ultimately, the most simple.

The fact that expressing these rules is verbose doesn’t automatically make them obtuse.  It’s very easy to see what’s going on in this method and after awhile you become familiar with how it’s organized and you instinctively jump to particular sections to work on it.

I believe this is a perfect example of not being mindlessly and slavishly married to rules of thumb that usually work but in certain situations like this, are made to be broken.

What I will probably end up doing is moving some inline code into methods purely to enforce a little discipline on the overall structure, to better document what is going on, and to handle the subjective “wetware” issue that future developers who encounter this routine may be blinded by its sheer size into rashly embarking on a Manhattan Project to slim it down because nothing about it can possibly be dealt with until it’s cured of its obesity.  Anyone who would calmly work with this code for awhile would, I think, acknowledge it’s going about its work in the best overall way, but a lot of people are blinded by religious preconceptions and I have to guard against that as best I can.

Put another way: best practices can be “just okay” or even “worst” practices in certain situations.  What separates adults from children in software development, as in life, is in always questioning assumptions and being willing to be unconventional and even crazy when it makes sense to do so.  Children want an ironclad set of simple rules that always are “correct”; adults recognize that at times the “wrong” thing is exactly “right”.

When you learn a “best practice” or “rule”, always think about what situations that practice or rule would break down in.  It’s a very useful mental exercise in mindfulness.

Leave a Comment

Previous post:

Next post: