I read an article in the Harvard Business Review lately and saw this interesting quote:
As Joe Pucciarelli, group VP and IT executive advisor at the market research company International Data Corporation (IDC), said in a recent Channel Company webinar, “Most organizations’ data sets are not in great condition. We talk about data and analytics as a strategy and priority, but the data isn’t ready to support it.…Most organizations, when they’re trying to solve a problem, the analyst who’s working on it typically spends 75%+ of the time…simply preparing the data.”
It’s this same problem that Pucciarelli described that inspired me to write my book, Data Work: A Jargon-Free Guide to Managing Successful Data Teams.
Originally, I was going to call this book How to Identify and Fix Data Quality Issues. I had worked for a company and discovered that its data was a complete mess. Every solution implemented and every technology adopted failed to produce accurate data on an on-going basis.
Our stakeholders had noticed this too. Many stopped using the reporting we delivered. While they passively contributed their inputs to the data solutions, they also made use of other data reporting tools that they trusted for their own work.
Basically, we had produced a zombie solution. One that had little real life in it, but kept walking along nonetheless.
While I felt I had found cool and novel ways to identify the data quality issues, it never seemed to improve things. And as I researched the root cause of these problems, I learned a harsh truth – it wasn’t one person’s fault.
The two data teams, and the many other individual contributors, were not set up to scale the way we did. I, as the report developer, didn’t have the ability to ensure the accuracy of the data engineers. The data engineer didn’t have the ability to ensure the accuracy of the various subject matter experts whose UTM parameters we depended upon. And those subject matter experts had little training or incentive to maintain consistency with their inputs.
In other words, it was an operations issue. And if we could tackle those operations issues, there was almost no need to identify data quality issues. They'd simply dissipate.
There’s a popular phrase tossed around nowadays called data-storytelling.
I have a love-hate relationship with this topic. For one, I think it’s a bit of a corny term. It’s often said that a good analyst “tells a story” with the data, even though much of our work doesn’t have a narrative to it. I think the more accurate phrasing would be that good analysts “answer a question” with the data. Then again, “data-answering” doesn’t have the same ring to it.