Want to Be a Data-Driven Organization? First, Leave the Data Alone
No company is an island, certainly not when it comes to data. Through a web of partners and customers, not to mention various internal systems, a vast amount of information is generated on an almost constant basis throughout this network.
The good news is that so much data means so much potential for value creation. The bad news is that very often this data, because it derives from diverse sources, is scattered through the organization and the ecosystem and stored in a variety of ways, with different sets of protocols, in different languages, and different formats. The larger the company, very often the greater this complexity and the more distributed the data. Trying to make data-driven decisions? Good luck when you can’t access enough, or uniform, or otherwise usable data.
Your first thought may be to take a full inventory, standardize, centralize and systematize. Unify and conform and get it to the cloud.
Go Where the Data Is
Sound thinking, but the reality is that time and technology move too fast. You’ll have to go to where the data is, not vice versa. (Or, as Wayne Gretzky would say, “Skate to where the puck is, not where it has been.”) That’s because in the emerging world of data, a single database or algorithm is unlikely to solve all your problems. As the amount of data grows, it is increasingly unrealistic to think it can be contained in a single place. In the future — and even now — the data should move to the model needed to make decisions or the model should move to the data when the data is too big to move.
This is a profound philosophical shift in how companies view data and its role in decision making. As leaders focus on becoming data-driven organizations, a key unlock is learning to be comfortable with the idea that the data will sit in different places, governed by different protocols, and in different languages and formats.
Focus on Getting to Data-Driven Decisions
The better way to think about your data in a digital transformation is to be clear on your ultimate goal: to be able to make data-driven decisions. What is needed to make that happen? Think of how to make the data accessible, wherever it lives now, whether it is in the cloud or on-premise. This is how microservices architecture works, for example, relying on distributed, autonomous services, each with its own server, that communicate with one another in a variety of ways depending on the request. Data can be distributed, autonomous and responsive, too.
In our experience helping enterprise-scale clients launch data science and AI projects, we’ve seen many struggle with harnessing the data before even getting to make that data available for AI or ML models. We kept having to build infrastructure each time so that the data could access the models. After several instances, it became clear that the problem with data is that it’s misunderstood. Companies believe it needs to be centralized and conformed in order to be usable.
Fundamentally, companies underestimate the operational challenges of working with very large amounts of data in a production environment. Sometimes, datasets are too large to drill down, so the companies want to move it out. But this could take at least six months, if it’s possible to move at all: Sometimes the stress of such a migration can cause it to break. As companies are increasingly adopting microservice architectures, the ability to leverage data is more important than ever. Industries those that are unable to build AI / ML models into their businesses will simply not survive.
So let go of the focus on the tactic of putting the data in the cloud. Don’t let the goal of becoming a data-driven organization obscure potentially better hybrid cloud strategies.