Democratizing Data: A Q&A with Qlik’s Mike Capone
In November, Dan Woods of Early Adopter Research had the chance to speak with Mike Capone, CEO of Qlik, about the company’s acquisition of Podium Data, AI, and data supply chains for the Early Adopter Podcast. Woods has written about Podium Data’s technology before at Forbes, and the combination of the two companies indicates a lot about Qlik’s strategy for the future.
This is an edited Q&A of the conversation they had for the podcast.
Woods: I’m really interested in how companies can create what I call product-based platforms. That is, choosing a variety of products and integrating them into a solution you need. We really study at Early Adopter. And in the data space, I’m interested in the idea of creating a very responsive and supple data supply chain so that you can understand all the data that you have in your company, transform it, move it around to where it needs to be, and then deliver it to the workloads, analytics, or applications that are needed for it in a reliable way. Otherwise, data becomes a chore. Now, Qlik has been around a long time and I’ve been studying the company for quite a while. What has always been interesting to me about Qlik is that the platform has had the ambition to be more of a data supply chain than most other companies. If you look at something like Tableau, I always thought of that as a way to visualize what you can see with a query. And then when it came down to landing data, creating a model data, creating purpose built data, Qlik has always had an ability to do that and has a variety of special purpose ETL primitives in its own language for creating an end-to-end pipeline. All of that worked in this world in which you would adapt Qlik in your contained world, and once you did that, you could replace a lot of ETL systems.
But now we’ve entered a different world. You have the first era of the centralized data wave, followed by the second era is essentially data discovery, the market that Qlik created, and now we’re in an era of data democratization, in which you want to open up and have a system where a lot more people can get access and configure it. Unless you can scale the number of people who can do every part of the stack, you have no chance of using all that data. Given all this, where is Qlik now?
Capone: I think you have a fundamental understanding of where Qlik came from and the fact that even though people viewed us as a visualization tool, we were always much more than that. From our ETL capabilities to our ability to be a data repository, which is a much greater capability than some of our competitors. I was a CIO for six years of a $12 billion dollar company, so I’ve been on the other side of this equation building those platforms, cobbling together the pieces to come up with an end-to-end data platform that delivers that value that you need, not just analytics, because the kind of analytics out there today are almost anarchy. Everybody has got tools and they’re ingesting data from different sources and you can’t really get to the lineage, the veracity, the trust in the data.
So when I came over to this side of the house, which is helping customers solve for these problems, one of the things I thought about was why can’t there be a full end-to-end platform. The problem is how do you get from raw data through data preparation, data curation, data cataloging, data lake management, individualization analytics, and then layer machine learning and artificial intelligence, not to replace the users but to augment their capabilities to deliver results. And so that’s what Qlik is setting out to do right now and that is a big part of the rational for the Podium acquisition because Podium gives us that capability to catalog data, to curate it, and most importantly, to eliminate the need for giant data warehouse ETL projects. You can now add data sources to your data ecosystem in days, which is not something that was possible in the old world of the endless data warehouse projects.
Podium has fixed the data lake in a couple of important ways. First of all, they abandoned the idea that you had to move all the data into the data lake. They have the concept of registering a data source and using profiling information. Secondly, like many other technologies, they try to create a data transformation. Now, the way I looked at their stuff, it was really about transferring well behaved data into new forms. It wasn’t about doing data cleaning or quality. The data should arrive in Podium well behaved and then you can use its visual and simple ways of transforming it to get to either reusable objects or purpose built things for analytical workloads. They distinguish themselves understanding that this usage of this data is going to provide a lot of importance. In a Qlik world, the whole idea is to create an extract or a definition of a subset of data that then can be delivered to Qlik View or delivered to Qlik Sense. How do you see all this working together so that you actually get this democratization?
In the future, there’s still going to be categories of users. But you won’t have an analytics department. Every department is the analytics department. Whether it’s customer success or sales or HR, they all have to be expert at analytics. Now, there are going to be people inside the company who facilitate access to data and information using the Qlik platform or other tools to actually do that facilitation. But what we want to be able to do is create an end-to-end experience. I would argue that the exposure of that data to the organization is actually what gets it to be well behaved and curated. The democratization means people have access to the information in a very agile way as opposed to letting the data lake team go off and pull it in and make assumptions about ETL.
The problem is once you do that exercise, what if you made the wrong assumptions? What if you don’t know what questions you’re asking? So by leaving raw data in place, cataloging it, letting you just go at it, people can understand what’s wrong with that data and then go fix it — that’s super powerful and if you can do that all in one platform versus cobbling together six different technologies, I think you’re ahead of the game.
I think that one of the most exciting things is the idea of how you can use the marketplace data of usage to focus your efforts on doing the cleanup, on making the data more well behaved. The data that turns out to be most popular, you can make it cleaner, more high quality, or enhance it or find other data that can enhance that area because obviously people have an appetite for it. What does creating this unified environment look like when you actually get it right?
You have some vendors that are cloud only, you have some vendors that are premise only. So there’s a lot of complexity in trying to work in this very heterogeneous landscape of BI technology and it’s changing very, very rapidly. We are agnostic as agnostic can be, so we run in your data center, we run in public private cloud environments, Azure, AWS, and then we have our own cloud that we’ve been building out. So that takes away a lot of the complexity. We’ll work with you the way you want to work with us, and depending on your workloads, you can run them in the cloud or you can run them in your data center if you have privacy concerns.
So we’ll take a lot of that complexity out of it. But more importantly, we’ll take the integration headaches out of it. It is just a lot of work and it takes a lot of bodies to actually go do that integration. When things go wrong, you have to go track down what the source of it is, and it’s a coopetition world so there’s consolidation going on but then there’s also relationships that were strong that kind of drift apart as people build competing technologies. We want to be a trusted partner. We can solve as much of this problem as we can for a customer, knowing there will always be things that are sort of outside our scope because they’re edge cases or things, and then we’ve got an open API platform that will let anybody plug into it advanced AI tools or other things. So that’s our approach.
Now, you have some clients that are really quite engineering-rich companies. I’ve noticed that in some of the wins you’ve had, you’ve been able to sell to these engineering-rich organizations and even though they had oceans of programmers around, they said, “Look, we’re not going to do that, we’re going to use a product.” What do you think is driving people to want to use the product in this space rather than a lot of the raw materials that you can get for free?
Well, two things. First of all, we pride ourselves on being engineering friendly. So we are developer friendly. We contribute to the open source community. We believe in that and we do think there’s a collective wisdom in the crowd in terms of people actually solving problems. I think as the problems get bigger and more complicated, it’s harder and harder to solve them with free open source code and I do think there’s a role for vendors in the space and they can easily coexist. We have a product called Qlik Core where we’ve abstracted our associative engine which is the thing that is super unique and there is no open source out there that is going to do what that does. So if you’re a smart developer, you say, you know what, my differentiation is going to be my app, my IOT product that I’m putting out there with sensors, etcetera. But why would I go try to replicate something that Qlik has already perfected in terms of this associative engine? And now because we have Qlik Core, we’ve abstracted that engine out, we’ve taken all the UX and all the other things so you can use your own and do a mashup, but what we’ve said is develop for free. Build and much as you want and if it works, that’s great. And then the charge is by consumption. If you put it in production and it gets a lot of success, then we participate in that success and your success is our success.
So there’s an engineering-rich developer-type place that can use Qlik as a foundation. But you also have people who even though they have a lot of engineering capabilities, they’re saying, “Look, we’re using not only the back end of Qlik, we’re using the front end as well.” So how would you describe the conditions that make the expense of a product like yours worth doing?
When I was a CIO, or even a CTO, if I could buy it versus build it, 99 out of 100 times I would buy it. As long as you get the right economics associated with it, it was never a doubt. Because again, I’ve got to look at what is my competitive differentiation. Nine out of the ten top financial services companies use Qlik. They have a lot of smart people in their companies who might be able to do some of this stuff, but the reality is I have 2,000 people waking up every single day innovating around analytics and data platform. There’s nobody else out there that’s going to do that. And by the way, we are very singularly focused. Some of our competitors do analytics and they also do databases and word processing and other things. This is all we do.
So our value proposition is we’re going to innovate on this every day. We’re going to focus on things like machine learning and AI, we’re going to build a full data platform, we’re going to think about the things that you don’t want to think about—because your competitive differentiation is actually writing algorithms to predict wealth for people or stock market or flash trading and things like that versus worrying about an analytics platform. And if you focus on what matters for your company and what’s going to help you win, you’re going to win. I tell CIOs who run their own data centers, “You’re already dead.” There’s no competitive differentiator in running your own data center when you can do things in Amazon. It’s the same thing with analytics. If you can get it and you can get it at the right economics, why would you ever think of building it?
You mentioned AI and machine learning, and my view is that most of the companies out there are going to use this technology through products. So it’s going to be presented to them either through a platform like Qlik or it’s going to be presented to them inside an enterprise application, where some AI capability is detecting anomalies or doing whatever AI and ML stuff in the context of a business process, where all of the surrounding work has been done. What does somebody have to do to become a sophisticated consumer of these products, to understand when they’re buying something that is a really good fit?
Your fundamental premise is correct, which is data science, analytics, augmented machine learning, AI — they have to be productized. They can’t exist off in a corner. Data scientists serve a very good purpose but ultimately once they figure out how to solve a problem, in order to get that into the hands of people whose day to day job depends on it, it has to be productized. There’s no other way. And the industry has struggled to do that and it creates bottlenecks in the business because now you’ve got rationing of data science resources trying to solve problems.
So what we’ve done is we’ve embedded the cognitive capabilities right into our product. The idea is to augment the users’ capabilities. When you’re using AI and you don’t even know it, that’s the best kind of AI, when it is actually embedded into your process. And our belief is lead with data. Every decision — operational, strategic, C level, down to the lowest level in the organization, the driver of the truck, the person on the phone with the customer, has to have data and insights in front of them to actually do their job. AI is a way to do that but it has to be layered into the product in a very seamless fashion.
You’ve been a CIO, and as soon as you arrive as a CIO anywhere, all of a sudden you realize that you’re managing this portfolio that previously was created but now you have to curate it and move it incrementally in one direction. If you then start awakening to this data democracy vision, you have the same problem, how do I incrementally move in this direction so that I achieve a larger platform vision? What would you advice be to CIOs or CEOs who are starting this journey?
I have a lot of these conversations with CEOs, with chief digital officers, and with CIOs. It starts with a strategy. People love to talk about digital transformation and there’s a lot of hype around it and there’s a lot of consulting dollars being spent on it. The first question I always ask is, before you start any work at all, before you even prioritize what you think you’re going to work on, tell me what your company looks like once it’s digitally transformed? Where is data going to be leveraged in your operations, in your strategy, to be able to actually transform, help you compete, differentiate and win in the market? Once you answer that question, then you can start with now I need to create a prioritized list of activities starting with the foundational layer, the platform, and then I can add to that platform to address to the various prioritized problems in my business. Essentially you want it to be driven by the business strategy.
My next question is about the radical data availability that is inside these newer companies like Netflix and Google. Netflix has a culture that is focused on the transparency of both positive and negative feedback and the willingness to let people go being seen as a positive thing. All of the data, even the crown jewel data about like how successful a program is, is available to almost everybody in the company. And salary data is available to almost everybody, not in a senior management, but in management. At Google, it’s the same thing. You’ve got this radically huge scope of data that as a Google employee you can see. That’s a difference between what most companies would ever imagine. Do you think that the need for this sort of radical availability is going to be a bottleneck, or if not, how are companies going to get the same results in a way that governs data a little bit more carefully?
That is the key question. Data governance is an incredibly important thing and the reality is that technological advances, hardware, software, cloud, compute power storage and analytics and AI are completely outstripping society, and quite frankly, the legislative body’s ability to deal with it. So they’re trying to catch up, but just when they start to catch up, the technology advances again. All we can do is be responsible in our space at Qlik—and we take that responsibility very seriously—and build the capabilities for a company to be able to manage data lineage, veracity of data, to make sure there are audit trails associated with data, to make sure there are appropriate capabilities to control security of information so that you can make decisions about how you want to use data and you can also audit what’s happened with that data and where it’s gone. Then culturally, companies get to decide if you want to be Google or Netflix. We will give you the tools to manage that and make sure you know exactly what you intended to happen happened. If you want to be more conservative about it, we will absolutely give you those capabilities as well. What I do know is that there is no one right way of thinking about this. If you’re a German data privacy officer, you’re probably not going to think the same thing as somebody at Google. That’s just how the world is right now and we’re going to make sure that we can accommodate everybody.
What harmful myths do you see out there in the market that are stopping people from getting AI right?
One is there was a wave of overselling of AI that went on for a while. For a while there were companies making noise about how our AI is going to replace doctors in terms of diagnosing cancer. And they put it into use, it turned out not to be true. And in fact, that particular technology got thrown out of a couple places. The reality is it’s a myth that AI is going to completely replace people. Now, are there AI robotic automation things that will eventually eliminate some activities? Yes, that is definitely true. But in terms of a knowledge worker, I think there will be augmentation, I think they’ll enhance the capabilities. But the myth that AI is going to solve every problem everywhere is a bad one. I think people are now understanding and people are talking about it the right way.
AI data lakes data lineage data management data supply chain podium data Qlik
Leave a Comment