Redefining Data Management: A Q&A with Informatica’s Amit Walia
One of the goals of Early Adopter Research is to explore and understand cutting edge technology as it emerges on the market. As part of that mission, Early Adopter’s Dan Woods recently interviewed Amit Walia, President, Products and Strategic Ecosystems for Informatica, an enterprise cloud data management software company, about the organization’s product management and development. Woods focused on how his long-standing idea of the data supply chain fits into Informatica’s product strategy. Informatica, which was taken private a few years ago, has a well-established, and very wide and deep product suite. It’s now managing a complicated transition from what used to be an on premises-dominated world with many separate products to a cloud world with a more unified product suite, and products that are becoming smarter by using AI and machine learning.
This is an edited version of their conversation.
Woods: What are your principles for evolving the product suite to be more coherent, valuable, and empowered by AI and ML as it moves toward the cloud?
Walia: Let me step back and talk about where we are as a company and then that’ll give you the context of where we are going with our thinking around how we develop products based on ML, AI and where the customer problems are.
Informatica, stands at the cusp of what we call all things data, and data management in particular. And that includes the markets of data integration, data quality, and MDM. And, in the new world, exciting new markets of cloud, which is catalog, data security, data governance. Data is becoming front and center, and the strategic backbone for every enterprise in the context of whether they’re going through digital transformation or how they have to manage and comply with regulations around data like GDPR.
The biggest change is that data has become a boardroom discussion around how to monetize to grow or how to make sure that customers keep their brand intact and manage it in the context of owning their customer’s data. And therein lies what we call the data 3.0 world, which is where companies are redefining how they engage with their customers, redefining the products and offerings that are coming out to market, or how they run their businesses with the help of data.
Woods: What is data 3.0, and what was data 1.0 and 2.0, and how is that helping you manage the evolution of the products?
Walia: That’s a great question. So data 1.0 was around 30 years ago when the rise of the application software happened, along with the first ERP accounting software. Basically, people were automating a business process to an application and data models or data was captive to an application and that’s it.
In data 2.0 world, data became a little bit broader, where there were what I call enterprise processes, like supply chains, and where data was not just captive to one particular application.
Data 3.0 world is completely different, where now it’s a data first world, where applications don’t matter, databases don’t matter. You are in the world of cloud, you are in the world of open source databases, you’re in the world of multiple fragmented types of infrastructure, and ultimately what matters is how good a job you do to understand the data around your customer, your supply chain, your product, or your suppliers to make sure you can be more efficient, better engaged with your customers or your business, or comply to regulations like GDPR.
Large enterprises are seeing that happen in front of them when they see an Amazon or an Uber or an Airbnb completely disrupting industries with the help of data.
Woods: In the data 1.0 world, you had data in a bunch of different applications and the problem was about integrating it and moving it into a data warehouse. And then in the data 2.0 world, you’re talking about supporting end to end business processes so you’re migrating data and the data is sort of following the action of the business process, and there’s a data supply chain so it’s not just about integrating in one repository for reporting like a data warehouse. It’s about having the data move along a data supply chain to track something. And then finally, what you’re saying here now is that the idea is that you want to be able to at any moment assemble all the data you have from all the sources in order to apply whatever type of analysis or reporting or AI or ML you have so you can completely make use of that data, in real time if necessary, to understand important questions about how to run your business. What’s different about what you need to do to support the data 3.0 era and how are you changing your product portfolio to enable that?
Walia: In this world, what is fundamentally different is you have to think of data almost as a system. There are five attributes to it and that leads to the most important thing that one has to do different and that drives the new product development strategies.
First of all, you have to think of data as a platform, and end to end, not like a monolithic ERP platform. Number two, you have to think about scale of data. Data is doubling every year and you have to think about how to manage the scale. If you’re making decisions today based on a certain set of data, it will double next year. New sources will come, and as you said, you have to make decisions in a real time basis. Third, and the most important thing is thinking of metadata. You will never be able to bring all the data in one place. Never. And you should not even aspire to. Cloud, mainframe, client server systems, outside systems. Bring the metadata together and then make decisions from there.
Which brings us to the fourth element, apply machine learning and AI to it. We as humans have built massive statistical models in the past to predict decisions. But the thing is that there’s only so much you can do with the scale of data growing, as well as the complexity of relationships in that data that even statistical models cannot handle. And that’s where AI and ML come into play to help complement human thinking by giving those correlations and predicting things that probably we would not be able to catch.
And the fifth thing is governance and compliance of data has become paramount. Today, you have to own that data in a very, very responsible way. So in that world, metadata is the place to go. Metadata is the new OS. And at that layer, when you bring all the metadata together, you can then apply machine learning to it or apply AI to it, to help make intelligent decisions. That’s what is guiding our product development philosophy. We basically wanted to be the Google of enterprise data through metadata, where we bring that combined metadata for a company to get a single view of all of the data assets by metadata, and once machine learning goes on top of it, it can start giving you all kinds of predictions and correlations.
Woods: It seems to me that one implication of this architecture that you’re explaining is that the former product boundaries that we have been used to, such as one product for ETL, one product for MDM, another product for the catalog, another product for analytics, are merging into one large platform where we’re not going to really have any product boundaries the way we did in the past. Is that the way your suite is evolving?
Walia: Yes and no. I don’t think that there’s going to be one nirvana product that will do everything for customers. Because if you are doing analytics, you will require data integration and data quality assets. You might want to use MDM and then you will obviously need data integration and data quality under the covers. But I think what I’m seeing is that the biggest difference is that the heterogeneity of data, the fragmentation of data and the fragmentation of underlying databases and applications has become so crazy that for organizations to truly get some control around and understanding of the data assets, we have to think differently. You can’t put all your data in one place to analyze it, right? So that’s where the metadata view becomes very important. And that metadata view then connects to every data technology. A good example of that is what Google did to the worldwide web. Once Google indexed the worldwide web, which is what the metadata does for the enterprises.
Woods: Based on what you’re saying, it sounds like we still need the data lake. The idea of the data lake was that we expand beyond the structure of a carefully defined, highly modeled data warehouse into a much wider repository that really encompasses all important data. But the problem was that that data lake implementation was wrong. The implementation of the data lake was too bound to the technology that had emerged for storing data at scale, Hadoop and the Hadoop file system, and then the mechanisms for storing and managing that weren’t really mature. And in addition, it was a losing assumption to think that you were going to move all the data into a single repository. So what you’re saying is we want to keep the data lake vision of a massive new catalog with metadata, to describe all of the data that is available, with ready access, but we don’t want to move all that data into a single repository?
Walia: That’s exactly right. And the way to get there is by having a single view of the metadata of all your data assets, and that’s why I say that metadata is the new operating system, because that lets you do all those things that you just said.
Woods: While you may have the ability to buy a product like an MDM or data quality product, it seems like you’re just going to be buying an interface to a larger integrated system and that from a product management point of view, you’re creating a platform that’s much more integrated than in the past?
Walia: That’s right. Integration of product comes in very different ways. The interface for data integration or data quality or MDM is all very different because the user is very different. But the integration happens behind the scenes.
Woods: Let’s talk about the actual experience of the product itself. Now you’re in a world in which a lot of the complexity that we have of understanding and mapping data, of making connections between different datasets, can be assisted by machine learning algorithms. The challenge is how do you create a world that starts out with this configuration, but then becomes assisted by the ML, and that completely changes not only the product experience but also who’s using the product. Are you all focused on more self-service products, because in my consulting, I have done a variety of engagements where people have tried to use self-service, but the problem has been that the data, even simple data applications, have tremendous complexity that’s much more like coding and it gets in the way of self-service. So how are you navigating this transition in terms of a user experience and product capability level so that you can go realistically and serve this new set of users?
Walia: First of all, there is no one answer for all the problems because ultimately you have to build offerings that are very persona-driven. Different users have different needs and they want to do different things as well. Our data lake is a great example of a product where we tried to manage it. What we didn’t do is make a single user experience, which to me is always an average of all experiences diluted for a business user as well as for an IT user. In a data lake, a data analyst or a data scientist wants to identify a business problem, and have access to all the data and start doing some analysis. Whereas the IT user on the other hand wants to know how quickly can I provide this data to the business user. But I also want to make sure that I know who has access to what so that there is some level of governance and compliance to it. What we did is create two different user experiences where the business user comes in and is searching for data and doing what I call basic ETL jobs, but in a very Excel-like interactive way. Whereas a totally different user experience is given to IT. That’s the way we approached it. You have to make the products to the persona that they cater to.
Woods: The idea is that once again we’re talking about an integrated platform underneath that’s providing all these capabilities and you’re creating these productization experiences based on your understanding of the use cases that people need. This goes to the last question I wanted to ask you, and that is when you talk to CTOs or software engineers, they’re people who code and build systems and they’re often hostile to the kind of products that Informatica represents because they think they can use off the shelf open source for data integration. They don’t want to have to pay for installing and managing a platform to meet their simple needs. But that becomes more and more expensive. Even for large or engineering-heavy companies. And whenever I see a company at a conference talking about their ETL and they’re a large company, almost always they’ve used Informatica because there’s huge amounts of value at stake. For those companies, it’s really about solving the problem completely with a product that’s going to last and evolve for them. They never think of Informatica as expensive. What is your view of the importance of productization and have you changed the way you’ve been hiring and running the organization to increase your ability to do research, and the ability to envision products?
Walia: I think that is changing. I think people realize that with open source, they have to be hesitant because a) it’s not cheap, and b) in the long term it doesn’t scale. Now, we support all open source technologies under the covers, whether it’s Spark or Kafka. But I think people realize that supporting those things is a lot more expensive than what you think. In terms of our product development, our goal is to support all open source technologies, to give customers the choice where you want to run it on. And in that context, basically, we hire people who understand all of these new technologies so that we fully support all of them. Our goal is to give customers choice because customers want the choice to do different things that perform differently for different jobs.
Woods: Have you had to change the balance of your staff to support that?
Walia: Absolutely. We’ve grown. Our overall product portfolio has grown, and the choices that we’re giving our customers have grown. It’s our belief that we want to be number one in data management so we want to give the customers full choice and we want to be the Switzerland of data. And that has led to not only growing, but also hiring different kind of talent which is well versed in these newer technologies.
Woods: Have you increased the amount of research that you do as part of the product development process?
Walia: Yes. We innovate. We are the leader in our market, so our job is to always be aware to where the market is going. In that context, obviously we do a lot of research. So when we were looking to support Spark under the covers, that didn’t happen when Spark became a mainstream engine. We were looking at it years before. We evaluate technologies to see what we want to bet on and what we don’t want to bet on.
Woods: Has the availability of data through using cloud products changed your product development process because now you can get more detailed metrics about how people are using the products?
Walia: Absolutely. We have the number one cloud product in the industry. We are the leader in the Gartner magic quadrant, as well as the leader in market share. Our cloud products, the platform they run on runs two and a half trillion transactions a month. So the scale at which we run is truly mind-boggling. That gives us a sense of what kind of workloads people are running and what kind they want to run.
Woods: You can choose to not answer this question, but I’m interested in the answer. We’ve just seen the merger of HortonWorks and Cloudera, and they’re two Hadoop-focused companies that have stopped talking about their Hadoop roots and are now talking much more about being a data platform. I wrote an article long ago saying that Hadoop was going to eventually disappear and be an embedded system. Do you think that that’s true, that five years from now, nobody will even mention the word Hadoop because the object storage layer of the cloud will have taken over all of the duties that used to be performed by HDFS and various processing engines because the ability to create data supply chains on the cloud and complex collections of workloads will supersede what has been done in the MapR, MapReduce, and Yarn world?
Walia: If I were in the prediction business, I would have retired a lot sooner, right? But unfortunately, I’m still working. I’ll philosophically say a few things. Definitely, the world of cloud is here to stay. I always look at any industry trend in the context of what a customer would want to. And if you look at it from that point of view, the ability for a customer to get started, scale up, scale down in the world of cloud is a lot easier. Obviously, it has its own challenges, including security and latency, but technology itself gets better in the world of cloud. Things go based on what customers like to do with it. I think cloud definitely is here to stay and will continue to grow.
Woods: You’ve added machine learning and AI to a variety of your products. What’s the right way to add AI to a product, and what kind of mistakes are you seeing where companies are pretending that AI can do too much or making things too complex for the user?
Walia: Every time a new technology comes along, the hype cycle is so crazy. Ultimately, with any technology, adoption happens if you genuinely solve your customer’s problems. And I think right now the mistakes I see happening are that either AI is being talked about as something that is just the best thing since sliced bread and can change the whole world, but it is not yet solving customers’ true genuine, what I call day to day pain. So I think focusing on what the customers’ pains are and solving for that is probably the best way to go versus hyping it up. AI is here to stay. But it will take some more time because of the high level of hype around it. But as long as it keeps focusing on solving genuine customer pain than fancy technical problems, it will get adopted a whole lot faster.