One of the character traits that separates okay developers from great developers (or for that matter, okay sysadmins from great ones) is the amount of ego they have mindlessly invested in Being Right.
I manage the processing of data files from thousands of companies every month, and while it’s mostly (and if I do say so myself, heroically) automated, there is always a healthy percentage of the Usual Suspects throwing junk files over the walls at us.
The other day a typical dance played out between various actors when a credit executive at one company wrote us, indicating that they had recently begun to offer extended credit terms to their customers and it occurred to them that these terms were not being taken into account in the commercial credit reports issuing from our system based on the data they shared.
I checked, and sure enough, 100% of their accounts were mindlessly claiming to have net 30 day terms.
A request was immediately put in by the executive asking their IT staff to provide a data file with extended terms, and this month’s data file arrived a few days later. It showed the accounts with extended terms, alright — and none of the accounts that had “normal” net 30 terms. As a result, there was a 90% overall balance drop which tripped one of our sanity checks. I used the prior month’s customer file combined with the current month’s trade experiences as a temporary measure and requested a correction to the customer file that included all the accounts.
The email I sent was not read beyond skimming the first couple of sentences. My clear description of the problem (“most of the records are missing”) became, in people’s minds, something along the lines of “extended terms were not included like you said they’d be”. Most tellingly, their developers and sysops, who could have confirmed that there was a serious issue by simply looking at the size of the file compared to past months, instead replied back, “yes we did add the extended terms.” The befuddled executive wrote me back and said all was well, try again.
At no point in this chain was there any real concern for accurate data, accurately presented and transmitted. This is arguably somewhat forgivable for C-level executives outside of IT, but it’s inexcusable for IT staff who should care deeply about credibility-critical, bread-and-butter concerns like these. And yet this kind of thing is, believe me, routinely overlooked.
The immortal Gerald Weinberg, in classic texts like The Psychology of Computer Programming, identified this issue decades ago, and spoke, for example, of programmers who are deeply offended and angered when bugs are reported, and whose first impulse is to shoot the messenger — especially if it’s a lowly user (or “luser”). They would rather minimize any difficulties or shift blame rather than to have the epistemological humility to consider the possibility that they screwed up or underperformed. Hence they seldom learn from their mistakes.
We can fault traditional management for part of this, for not having proper attitudes toward mistakes that encourage people to learn from them rather than conceal them. We can fault the manager in my example above for being too preoccupied and/or self-important to be bothered to carefully read a five-sentence email and make sure they understand the problem before communicating it. We can philosophize about how everyone involved should be computer-literate enough to notice the tiny size of the file, open it in Excel and see at a glance what the actual problem is for themselves. Or at least being willing to pick up the phone and talk the issue out until they are legitimately certain they understand it correctly.
The reason for this is that their default position is that they couldn’t possibly have erred. Whereas I was trained from day one to worry that I had erred. Maybe good programmers don’t sleep well at night. Every now and then I bolt out of bed to check something it occurred to me that I might have missed earlier that day, and I am, to anyone who knows me, about the furthest thing from an obsessive-compulsive personality.
This isn’t to say I never mess up. I’m human, so I do. The difference is that I have no habitual impediments to recognizing when I do.
We in IT can’t speak to such issues before we have our own house in order.
What bugs me about this incident is that it’s not even the usual rocket science, it’s simply sharing a file between companies that’s roughly the same size as per usual and is altered only in the specific way requested. Almost everyone in the communication chain in both companies was perfectly capable of grasping it without special training, if they could be bothered to think about it for ten seconds. I can just about guarantee that someone filtered the output to test (partially) that they were doing it correctly and then forgot to remove that filter before returning the export to production. A small, easily fixable mistake even if it was needlessly allowed to happen because of a lack of proper quality control and testing.
And that’s just the immediate issue. Who knows how long they’ve been mischaracterizing trade experiences because no one though about the extended terms issue up front? It’s clear their management is reacting rather than acting. Who knows whether the IT staff is ever brought into the the loop of the rest of the company enough to understand and care how what they do serves the bigger picture? Did the offending sysadmin or developer in this case even understand the meaning and purpose of the data he was working on? Or was it just random bytes to them?
If we can’t get human beings to do simple things like this correctly, even within IT culture, what hope is there for, say, a mission-critical tech retooling that requires dozens of bleeding-edge technologies to work together correctly? Or a mobile app initiative that requires you to look at how business in conducted in a whole new way?
To quote Weinberg again, “ultimately, all technical problems are people problems”. How right he was!