How DataOps Will Differ From DevOps
The idea of DataOps is rightfully generating a lot of excitement. Anyone who has seen up close what DevOps has done for development knows that applying the same principles to data management will be transformational.
As part of a research project I am doing with Lothar Schubert at Hitachi Vantara, I’ve been digging deeply into the question of what will DataOps mean in the enterprise. The white paper “Is DataOps Your Windfall of Value?” is the first result of this research project.
My first takeaway is that the emerging practice of DataOps is going to be something much broader and deeper than DevOps. There will not be a one-to-one mapping from DevOps to DataOps. DataOps will have dimensions and requirements that go far beyond the world of DevOps because the problem space is so much larger and more complex.
The rest of this article seeks to explain those differences.
DataOps covers a much broader landscape
The world of DevOps was all about breaking the siloed operations of developers and operations people. They were the primary beneficiaries of the unity DevOps brought. DevOps created a unified end-to-end process that started with development and ended with deploying the software and monitoring its usage. In the most advanced implementations, you could push one button, and the entire site could be created from the source code repositories.
In the world of DataOps, instead of just developers and operations people, you’re talking about practically everyone in the entire company. You have people who are the data experts who are providing the data and creating systems for others to use like IT systems and operations people, data engineers, data scientists, business analysts, and experts in various types of tools. Then you also have everyone using htat data, including business executives.
In addition, there are far more layers. If you think about all the places where data comes from, all the layers of integration and places its stored and then how it is reshaped and delivered into systems for users who then may reshape and deliver it again, there’s a much more extensive range of capabilities under the umbrella of DataOps than DevOps.
There’s also far more data. The data that is tracked in the world of DevOps is about the software development process. In DataOps, almost all the data in the enterprise is included. There are also more systems involved, whether systems that are at the edge, in the cloud, IoT systems, or those that are integrated on-premise. All of these factors will combine to make DataOps far more complicated that DevOps.
DataOps will have lots more tooling
DevOps created better systems for automated testing, instrumentation, continuous integration and development, and you could even argue the tooling around microservices is a result of DevOps.
In DataOps, companies will have lots of existing technology that will be incorporated into a new way of working and many work layers. Companies will have data engineering and ETL, data repositories that include object storage access, tiering, and hybrid cloud, and then all the traditional repositories like data warehouses and data marts.
There will also be systems for constructing data pipelines and maintaining them. The construction and maintenance of these pipelines must be streamlined. Other techniques like data virtualization and profiling will also play a role. Additionally, one of the consumers of all this data will likely be ML and AI. In DataOps, we’ll probably see a new integration of these tools (for more on this topic, see this blog).
DataOps will break more new ground
The large landscape of technology that I described in the previous section, will never work in a DataOps context unless it becomes simpler to configure and control once it’s in place. That means a massive expansion of metadata to describe repositories. That metadata will be used to drive and expand automation. Eventually, you will achieve policy-based and low-code ways to manage these systems. At some point, companies will reach a point where the management of these systems can be done without super skilled experts to program every piece of them. DataOps will break new ground in improving all these systems we use to make them more self-service.
DataOps won’t be as uniform DevOps
The similarity of development across a variety of different companies is greater than the similarity of how data is used across those businesses. Even partial implementations of DataOps will have a big impact on how data is used in companies. DataOps will be implemented in many different ways across varying industries and scenarios, but the results will be profound.
DataOps will have a much larger economic impact
The number of people using DataOps in an organization will be much larger than with DevOps. The number of people who will get access to better data will be massive. This will result in a much larger economic impact with DataOps than DevOps. And that’s saying something, as DevOps had a huge economic impact as it sped development and innovation and increased quality. But DataOps will be larger than even this.
DataOps will dramatically change top-down corporate culture
Data is a cat that cannot be put back in the bag. Once people have access to quality data, it doesn’t make sense for them to not use it to improve their work. The more people have access to more data will lead companies to make better decisions. If companies don’t allow people at the edge to use data to make those decisions, they’re placing unnecessary limitations on their workforce. This will lead to less top-down decision making and empowered people.