I read an article in the Harvard Business Review lately and saw this interesting quote:
As Joe Pucciarelli, group VP and IT executive advisor at the market research company International Data Corporation (IDC), said in a recent Channel Company webinar, “Most organizations’ data sets are not in great condition. We talk about data and analytics as a strategy and priority, but the data isn’t ready to support it.…Most organizations, when they’re trying to solve a problem, the analyst who’s working on it typically spends 75%+ of the time…simply preparing the data.”
It’s this same problem that Pucciarelli described that inspired me to write my book, Data Work: A Jargon-Free Guide to Managing Successful Data Teams.
Originally, I was going to call this book How to Identify and Fix Data Quality Issues. I had worked for a company and discovered that its data was a complete mess. Every solution implemented and every technology adopted failed to produce accurate data on an on-going basis.
Our stakeholders had noticed this too. Many stopped using the reporting we delivered. While they passively contributed their inputs to the data solutions, they also made use of other data reporting tools that they trusted for their own work.
Basically, we had produced a zombie solution. One that had little real life in it, but kept walking along nonetheless.
While I felt I had found cool and novel ways to identify the data quality issues, it never seemed to improve things. And as I researched the root cause of these problems, I learned a harsh truth – it wasn’t one person’s fault.
The two data teams, and the many other individual contributors, were not set up to scale the way we did. I, as the report developer, didn’t have the ability to ensure the accuracy of the data engineers. The data engineer didn’t have the ability to ensure the accuracy of the various subject matter experts whose UTM parameters we depended upon. And those subject matter experts had little training or incentive to maintain consistency with their inputs.
In other words, it was an operations issue. And if we could tackle those operations issues, there was almost no need to identify data quality issues. They'd simply dissipate.
How Good Data Work Depends on Operations
Data work is a team sport. It’s seldom the work of one stellar contributor, but a collection of them. It’s not just the data engineers and BI developers, but the subject matter experts and contributors found on other teams, who often directly contribute to the collection of the data.
That makes it harder to scale data solutions while also maintaining data quality. And that's important because data quality is the foundation of success within data work.
If you look below, you’ll see the Pyramid of Data Solution Success, which I describe in the book:
Typically, data solutions fail because the analytics team is incentivized to focus exclusively on the top. They want to make the stakeholders happy and they want to deliver things fast. They probably also want to make things pretty too, in the form of a Tableau report or a fancy PowerPoint presentation.
In the short-term, this makes them seem valuable. But in the long-term, as your operation scales and there’s more and more requests, it becomes difficult to maintain data quality.
As quality declines, it does become apparent to your stakeholders and this erodes your credibility. If you can’t deliver accurate data, what good are you?
What Causes Data Quality Issues?
Whenever I determined the central causes of data quality issues, I found that the following were to blame:
Lack of training / communication
Frequent process changes / reorganization
Poor understanding of the data
High employee turnover
Disinterest of management
The common solutions people pursue to fix these problems are investing in new technologies and hiring “top performers.”
But technology alone is seldom to blame. I find that most companies already use tools that can do the job. A different tool with a slightly different feature doesn’t change the underlying issues described above.
Top performers are also, just as the name implies, rare. Most employees, by definition, are within average and so are the people applying to work for you. Unless you have a large budget, it’s hard to hire a team that can resolve these issues without support.
That only leaves one valid solution – operational improvement.
You can build processes that resolve not just data quality issues, but also every other level on the pyramid.
That shifts the burden from individual contributors to management.
The only problem is that many managers in the analytics space are inexperienced with data themselves.
Many Data Team Managers Are New and Don’t Have the Experience to Know How to Operate
In my field, it’s pretty common for people with great leadership and management skills to be hired to manage a team of analysts, despite the fact that they have never worked in analytics.
There’s many reasons this happens, but the most common is that there simply aren't enough data workers with the experience or willingness to take on these leadership roles.
So many companies look to existing leaders to take ownership of these teams and deliver on expectations. These managers are usually smart, ambitious, adaptable, and have demonstrated leadership success. That’s especially true in marketing (where most of my industry experience comes from), where it’s not that unusual for managers to move to new areas of expertise.
I think managers with experience in software development or other IT professions have an easier time transitioning, since the rules are fairly similar to their profession's, but managers from other professions have a harder time.
Data work is often different from the teams they previously managed. Often, they want to deliver predictive analytics, but their first job is freeing up their analysts time through automation. And when you need to automate, the question is often “what do we automate?” Automating all the unique Excel-based reports isn’t feasible. In that instance, how do you prioritize? And when automation involves more coordination between teams, how do you ensure quality and consistency in results?
Good operations make that automation process easier and it's important for those managers to learn how to do that.
How Operations Improves Solution Quality
Operations is a fancier word for process. Truth be told though, I don’t like the word “process.” Instead, I like the word “habit.” That’s ultimately what we want processes to become. Not something bureaucratic that we do to avoid pestering complains from management, but something that we do naturally.
In the same way going on walks improves health, good habits in data work improves quality. I actually believe with confidence that data teams with so-so technical skills, but great habits, can outperform data teams with phenomenal tech skills, but terrible habits.
The habits that improve data solutions include:
Quality checking is a no-brainer. You simply verified that you built what you said you’d build and that it works.
Requirements gathering is the harder one. Most people think they do it well, but project deadline extensions and scope creep say otherwise. Data workers are often young and don’t (yet) have the interpersonal experience to work with stakeholders. That means they need coaching from their boss on how to do it better, which my book describes how to do.
The reason requirements gathering is so effective in improving data and solution quality is that it reduces ambiguity and reactive efforts. Stakeholders better understand their commitment and can make a judgement on whether it's worth the effort. It also means fewer surprises later, which often require quick fixes that compound data quality issues.
There’s also the added benefit of less overtime. I find that overtime usually results from a lack of organization and coordination on project expectations.
Training is also a big habit that can improve success. It’s obvious why training will improve your team’s work, but you’ll often need to train other contributors outside your team. Requirements gathering often reveals what those contributors need to deliver on an ongoing basis. Training can ensure they know how to do that effectively.
Other processes are dependent on the work you do. As you find out what repeated tasks you have to deliver, you’ll need to develop a process to ensure the quality of that work.
Check Out My Book "Data Work" to Learn More
This is a high-level overview of good data team operations. There’s far more nuance and even more policies a data team manager needs to implement to improve quality.
Other areas that can have a positive impact on quality is reducing turnover – something that many companies neglect to consider.
The data work job market favors employees more than employers. It takes a proactive approach to reduce turnover, which will lead to an improvement in solution quality.
My book provides practical solutions to that problem and sets out a roadmap to better data operations. With its advice, you can improve your team’s work on every level of the Pyramid of Data Solution Success.