Concurrency Matters: A Q&A with Spencer Kimball of Cockroach Labs

The database market has experienced a remarkable level of innovation over the past few decades. In a recent interview for the Early Adopter Research podcast, Dan Woods spoke with Spencer Kimball, co-founder and CEO of Cockroach Labs, about the history of the modern database market. The conversation covered the beginnings of databases, how they’ve evolved, and where they are right now. Kimball offered an interesting explanation of the history and development of SQL databases and other data storage systems. SQL databases did not spring form the head of Zeus. There was a long line of innovation that led to them. And in fact, they’re not as nearly uniform as one would think. By analyzing the design choices involved in SQL databases, an understanding can be gained about what may be a better data storage strategy for the modern world. This is an edited version of their conversation.

Woods: What is Cockroach Labs? Why did you decide that it needed to exist, as a founder?

Kimball: Cockroach Labs is the company that’s been founded around a database project. It’s an open source database called CockroachDB. CockroachDB is very much inspired by a lot of the pioneering work that Google did in the last decade and well into this decade now around databases. And Google did that work, not necessarily out of a sense of altruism, but out of a need to solve problems related to databases based on the kinds of resources that were available to them, the kinds of applications and services that they wanted to build, and the global scale of their ambitions and customer base. When we, the three co-founders of Cockroach Labs, left Google and found ourselves in the quote-unquote “real world,” it was like descending from Shangri-La into a medieval village. Because at that point in time, in 2012, I think Google was very far ahead of the broader ecosystem in terms of internal infrastructure capabilities.

Could you explain some of the aspects of the challenges that Google was addressing that were far ahead?

There’s a couple of different strands to the narrative that I think if we examine them in aggregate, it will start to come into focus. In particular Google had a problem of scale unlike any other. Oracle, which really won the database wars in the 1990s was never architected, and still isn’t architected, for the kind of scale that hit eBay and Yahoo. And then Google had to tackle audiences that exceeded 10 million, exceeded 100 million. So that kind of scale challenge was really the first thing that caused Google to jolt into a parallel evolutionary track in terms of how to actually solve data challenges. The other really interesting and big thing was that Google had data centers all over the country and then all over the world and they had these early aughts. That was a very strange thing. Of course, nowadays everyone has access to it even at start up because of the public cloud. But with all of those resources and customers everywhere, the   traditional idea of a monolithic database no longer suited both the challenge and the opportunity that was in front of Google. And then finally, fundamentally, when you have literally a thousand external services living in a data center, running on these different data storage technologies, replicating across data centers, anything that can go wrong will go wrong, and it will go wrong spectacularly. So making your systems truly bullet proof, not allowing wiggle room and hand waving, ultimately led them to insist that everything that was built had a level of stability and scale so therefore, they could iterate quickly.

Is it fair to say that in some ways Cockroach is productizing and making available some of the innovations that were part of these super scale pioneers?

Yes, absolutely. We cut our teeth on Google distributed infrastructure. We were there for the better part of a decade and we had a front row seat to the evolution of a number of distributed systems. We personally worked on a system called Colossus. Spanner went from   the next big idea five years down the road, finally making it into production, which is kind of when we left. Colossus is an exascale distributed file storage system. So if people are familiar with Amazon’s S3, it’s quite similar to that. Everything is stored on it. In fact, Spanner stores its data on there, YouTube stores its data on there, Gmail stores its data on there, all of the indexes of everything on the web are stored on Colossus. It’s basically the backing layer of the massive amounts of data that Google is constantly needing to store and is generating. Spanner is a further level up the stack. It’s still definitely data storage infrastructure, but it’s just undifferentiated blobs of data. Spanner is in fact a database. And the distinction between a blob store and the database is that the database typically has some semantic understanding of the data that’s in there. So there are not undifferentiated blobs, they’re often metadata where you want to quickly access small pieces of it. You also want to be able to change it potentially with many concurrent actors changing the same data and accessing the same data while it’s being changed. And that turns out to be a fundamentally difficult problem and is one that’s been being solved for the past fifty years in a string of fits and starts. It’s involved an arms race between what businesses need and what database designers and implementers can deliver. Whatever is delivered is immediately utilized and then the application use cases are clamoring for more. You never get very far ahead of what people actually want to use. That leads to necessary amounts of strictness and correctness in the underlying database. In 2018, of course, that’s true for any startup that is developing a mobile app and needs to build a back end. Companies have to address how to service their customers regardless of where they are. You have a lot of amazing tools at your disposal from all kinds of open source, like Kubernetes for example, like CockroachDB. You also have the public cloud that allows you to actually acquire resources quite quickly, essentially all over the world. And because of the amazing distribution channels, the app store for both IOS and Android but also the web in general, this allows you to reach basically five billion connected people on the planet virtually overnight compared to the way these things used to work. All of that, on aggregate, means that you either uplevel on your tools and your data architecture capabilities, or you don’t compete.

What’s really interesting was that it’s not clear to everybody what was gained and what was lost in this innovation of databases and that’s really one of the points of Cockroach is how can we move forward and preserve what was gained but not lose anything in the process. Would that be a fair way of talking about what you’re trying to achieve?

Absolutely. And in fact, this isn’t the first time this cycle has appeared in history. But fundamentally, Cockroach was born in the era of separation between typical SQL architectures, typically monolithic, and NoSQL architectures which were the first cloud-native database architectures. So NoSQL, some would argue, was born at Google with the Big Table project and there were maybe five years after an upswell of mostly open source products that followed a similar vein. The goal with NoSQL was to have elastic scalability, to have replication to further high availability. Some of those systems included Riak, HBase, Cassandra, MongoDB—many of these people have heard of. These things that those systems gave up for the most part are well typified or summarized by the fact that they put “no” in front of the word SQL. SQL, the traditional SQL databases which evolved for forty years up to that point had very strict guarantees, had a very elegant—some would say overly elegant—query language, which is what SQL stands for, Standard Query Language. Obviously things like transactions, that stuff was very evolved, very difficult to build even in a monolithic context. I think the realization in the early developers of NoSQL systems was, okay—building a cloud-native database for the first time that truly is distributed is such a monumental task that we can’t do both that and maintain all of the evolved sophistication of the SQL monolithic architecture. That was a bridge too far, so the approach was we’re going to take some use cases that probably are simple enough that they don’t need all of the evolved sophistication of SQL but they do need scale. And that’s what Big Table architecture was focused on. In 2006, Google had introduced a new data store called Mega Store. Mega Store started to try to bridge the gap. So it took Google all of two years to realize that as interesting and amazing as Big Table was, it wasn’t obvious when Big Table came out internally that it wouldn’t be applicable for virtually everything. In fact the AdWords team at the time was running on SQL and of course this was the AdWords system. There were complex schemas, transactions we used basically everywhere. That complexity in building applications wasn’t supported by Big Table. But Big Table could get very, very big. The AdWords system needed to get big so the obvious ask was can you guys move this AdWords system to Big Table? And the answer from the AdWords team was there’s no way. It’s impossible. We need these schemas, we need transactions, we can’t do without them. So Google pretty quickly began to work on Megastore which started to add some of those capabilities. Eventually that culminated in Spanner and something called F1 which really tried to bridge the gap and bring it all together in one package. When we left Google in 2012 and having to use what was built on open source, it felt like a bit of a letdown. Like we had moved from an era of plenty to one of scarcity, and that’s where the idea of Cockroach was born. We wanted to take all of those learnings at Google and put them into a much easier to use, much easier to deploy open source system. And the idea of Cockroach came from just the idea that, well, if we were going to build something like this, all of the nodes have to greedily replicate themselves and be fairly autonomous and if one goes away the other ones have to pick up.

Would you talk about the tiers of implementation from what Oracle decided to do, to what some other players have decided to do, to what you and it sounds like Google have decided to do, in terms of creating a store that’s both scalable distributed but also adheres to the ACID properties and high levels of isolation?

The definition of the term ACID really helps to add some context to the whole explanation. The term was developed in the 1970s, and it was pretty radical stuff at the time. Computer scientists invented transactions effectively in order to solve the problem that they saw in the wild. I believe this was for a system that IBM built for one of the airlines, I think it was either PanAm or American. It was called SABRE and SABRE was a reservation system. And as many people that use NoSQL systems are aware, when you work in a test environment with very little concurrency, you don’t need transactions. It’s really one actor either reading or writing data from the database. However, as soon as multiple users become concurrent and one is trying to write something, another one is trying to read it, a third one is trying to write it also, all kinds of weird things can happen unless you control that concurrency somewhat. So with the SABRE system, when multiple travel agents began trying to make flight reservations at the same time, they trampled over each other. People realized, wow there’s an actual problem here, we need to solve it. So the idea of ACID was born. And that stands for Atomicity, Consistency, Isolation and Durability. I think most people solved the A, C, and D. The “I” is solved to various levels of perfection. The original 1970s definition of ACID, that “I” embodied a property known as serializability. And there weren’t a whole bunch of other options that you could choose from that meant serializability, which effectively means that for all intents and purposes, no matter how many concurrent users are trying to access the same data, the “I” guarantees that all of them feel and experience the database as though they’re the sole actor. In other words, you could have 10,000 agents all trying to get the same airline seat, but every single one of them will feel like that they’re the only one using the database. That means that only one of them will get it, the others will see that it’s filled, no matter how they come in and access it. Of course the way that transactions can interact with each other become considerably more complex than just trying to all write one value. It usually happens because you read this, you read that, you write this, you write that. These things kind of inter-tangle and they can become quite complicated.

So what happened after that original 1970s definition is that as people started to implement systems that provided some local concurrency control, they realized that it’s very expensive to provide serializable. You can make the database run a lot more quickly if you can relax it. So as many modern users of databases, SQL databases, are well aware, there are a number of different isolation modes that you can choose when you create a transaction. So there are things like recommitted, repeatable read, snapshot, serializable of course. All of these terms are very strangely conceited. Because instead of representing what the concurrency feels like to the user, what they are actually describing is how the database is cheating to not really provide serializable.

You’re choosing which compromise you want essentially in your isolation levels?

Right, based on the transaction isolation level, you can implicitly determine how that’s going to feel if there’s a concurrency and you’re trying to run your own transaction. But it’s really kind of backwards, and so it’s very difficult for application developers to realize, okay, well in this situation I can use this particular isolation level. And in order to really use them correctly, the application developers have to go one step further. They actually have to understand how to manually lock things. So they’re essentially having to implement some of the smarts in every single line of application code that is the database that the database itself should be able to make those same decisions.

The idea here is that there are varying levels of isolation. You can pick how strict or how relaxed your transactionality is, the more relaxed it is, the easier it is to implement but the more strict it is the more you get toward that ideal of it’s as if one person is using the database. But unexpected behavior can take place. Can you give an example of an anomaly?

The most subtle form of anomaly, which is the one that serializable protects against versus the next most stringent form of isolation, which is called a snapshot, is something called Rightsphere. And it’s a really tricky one, but essentially what happens is that two different actors both read a value say of accounts A and B. They both read those values. And then one transaction updates the value for account B, the other represent—updates the value for account A. If you’re just using snapshot, what will happen in that particular case is even though one transaction is changing one of the account values and the other transaction is changing the other account value, they’re not forced to serialize. They both start with the same knowledge of the two account balances and then they both, based on that original knowledge, alter the two independently. So you can end up double withdrawing or double depositing. And so obviously that’s a situation where if you allowed that anomaly in that system that handles a financial ledger, you could end up essentially having people rob you, right? And that’s the fundamental take away from this paper, ACIDRain, that came out in 2017. I use that example purposefully of Rightsphere. It’s actually a fairly difficult anomaly to see in the wild and that’s why most databases, including of course the world’s most popular database Oracle, certainly the most popular commercial database and SAP Hana, are used by the Fortune 2000 for virtually everything. Both of these databases don’t support serializing. They support what’s called snapshot, and snapshot allows this Rightsphere anomaly. And that should be a fundamentally shocking thing. And everyone’s known about this for decades. Everyone’s used these things regardless on an assumption that these database anomalies, these concurrency anomalies, were problematic maybe. You could get some weird behavior that you could see, but not fundamentally dangerous. Not more dangerous than having a user complain about something. Well, apparently, as everyone knows, dedicated adversarial activity is on the rise and these same anomalies are how recently a Bitcoin exchange was emptied out. This ACIDRain paper is the first academic treatment of the systemic risks of concurrency based attacks.

The point of what you’ve been saying is that there’s an important property of higher quality programs that is dependent upon the level of isolation, level of transactions that you’re implementing. And when you don’t have that inside the database, it either has to be corrected in the program, or the program has to watch out for it somehow and that’s what you’re talking about where the programmers need to lock, or you’re vulnerable to the problem happening?

Yes. When the programmers are responsible to do the manual locking, they virtually never do it correctly. It’s almost like window dressing. Even database engineers that actually are responsible for building these transaction walls and doing it once and doing it correctly in the database, often mess things up. It’s very hard to get these things right. The default is often something called recommitted which is basically down at the very bottom of the bin in terms of something that is going to give you stringent protection.

No matter how good we get, we cannot get away from an engineering discipline as being another important part of solving this problem. It’s about the code, it’s about the discipline, it’s about the code reviews, it’s about finding ways to search for and protect yourself from these problems. So you’re not claiming the database will automatically protect you from every type of error?

Absolutely not. In fact, in this ACIDRain paper, the methodology was they looked at a bunch of open source e-commerce platforms, which actually run 50% of e-commerce online, so it’s a massive impact, but there’s like 12 open source applications. They found in many cases, the applications, obviously authored by developers out there, didn’t use transactions at all. So obviously it doesn’t matter what isolation level you use. So you’re exactly right, the idea of code reviews and the right engineering disciplines are fundamentally necessary. And if you don’t know that you’re supposed to be using transaction, you don’t have to look much further than a lot of the NoSQL usage that exists out there in the wild, you get into trouble pretty quickly.

The ACIDRain research is a set of academics who said, wait a sec, there is a large problem in the world because of either inept or ineffective use of transactions. And they tried to examine how many systems they could identify that had transactional based vulnerabilities. Have people come to you because they started asking these questions and realize, oh my god, we can’t fix it with the technology that we have?

Not so much. Here’s the thing. This is very true of Google which I think many people believe correctly has good engineers. My experience at Google on the AdWords project was that transactions were often not used correctly there. That, I think, was one of Google’s fundamental learnings about the NoSQL, about even SQL usage. When you have distributed systems that scale anything that can go wrong will go wrong quite quickly and spectacularly in many cases. I think that the bigger learning here is that if you leave this to application developers, something that isn’t really part of what they’re trying to accomplish, they won’t address it. So the reality is they don’t worry about that and if you need to solve that you need to solve that at the level of the database. You need to make it so that it applies uniformly and consistently across all of the applications because the database is actually when it just supports that. One thing that’s very interesting about the ACIDRain paper that I think was the key insight, and this is quite brilliant, is that they were able to find a mechanism to notice these anomalies which are otherwise very difficult to notice. They actually examined the traces of the application using the SQL database and were able to break that down and determine when an anomalous condition could occur. And that’s fairly brilliant. But the finding that’s shocking is that just by examining these 12 popular e-commerce programs as you mentioned, there’s a lot of problems. They were riddled with problems with transactions. It seemed pretty clear that the entire ecosystem of millions of services and applications that are database backed that have deployed around the world are in fact riddled with these kinds of concurrency related attack surface. So you can’t leave these kinds of things to be easily abused. In fact, you have to say we’re going to design this database so that serializable is extremely efficient and can be the default, in fact, should be the default for how applications use transactions.

You said before that one of the reasons Oracle made the decision it did is because they were of the opinion that serializable was not scalable, right?

Yes. That’s correct. They actually have an almost, at this point, fairly wooden shambolic attitude which is you cannot make serializable transactions fast enough. You can’t do it, you can’t do it. And that’s what they say at conferences, and so their database doesn’t support them. The reality is that you can and we’ve shown that through our work with the TPP-C benchmark and where we’re actually using serializable isolation for every single transaction that’s run in the system, and of course we’re consistently replicating the data so there’s a lot going on there and we still achieve linear scale within the TPP-C benchmarks.

So now, if you were facing a CEO, a CIO, a CTO, a VP of Engineering, perhaps even an architect, maybe the lead developer and they were listening to this conversation and they said, “Look, we get it. We understand that we should be worried. But like you said, our job is not to build database technology, our job is to build applications.” Also you get that there’s all sorts of great things that can happen with these new technologies—with graph databases, with NoSQL. Many problems that were hard to solve become a lot easier to solve. Many applications that were hard to build initially get easier to build. That said, that doesn’t mean that they’re safe. That doesn’t mean that they’re okay. I assume you’re not arguing never use anything but serializable SQL. On the other hand, how do we get our work done, how do we create a data storage strategy that allows us to take advantage of everything but also be safe? What would your recommendation be then?

I wouldn’t be spending as much time talking about this piece as I have if that was the audience. I think it’s very important to realize that this correctness is something that we took very seriously first because if you don’t solve it up front, then you’re never really going to solve it properly. The ability to implement serializability at scale with high performance with completely consistent consensus replication is vital. These are things that, for example, NoSQL systems chose not to have. So they had so they had eventually consistent replication. They had no transactions. And now that they’re adding transactions, the transactions aren’t serializable. We just wanted all of those big question marks which, believe me, they might seem small but when you actually get something deployed around the world they become very large very fast. And so we wanted those things to be buttoned up. The point of Cockroach wasn’t just to build a more correct SQL database, it was to build a SQL database that was ultra-resilient and suitable for building global data architectures. That was the actual point and that is the point. If you’re going to build that ultra-resilient database for global data architectures, for global businesses then if you don’t solve the correctness piece, I think that you’re not very credible in 2018. But it’s actually something that, as I mentioned, shockingly the big players out there have not solved correctly. The way that Oracle and even IBM mainframes have been solving this, there are some esoteric ways too, but by and large, the common way is to have a system that does asynchronous replication to another data center, to another mainframe, or to another Oracle file system. And the problem with that asynchronous replication is that you lose the data center and you failover, you might have what’s called a non-zero recovery point objective. The recovery point objective is how much data you might have lost when you have a failure event from—so that could be, if you’re lucky, it might be some number of tens of milliseconds. More likely it’s some number of single-digit seconds, and if you’re very unlucky it could be minutes or hours. And all the work that happens in that interval is lost until that original site comes back. Fundamentally what we’re doing is when you do it right, you’re going to send that right also to both of the other two replication sites, whichever one is able to commit first and return to you, “Yes, I’m also agreeing, this is going to become committed.” However long that latency is, is how long it will take to do a commit. So you have to get a majority. So if you’ve got three replication sites, you need two of them to agree. If you’ve got five replication sites, which is not a good way to run, then you need three out of the five to agree. What this gives you, of course, is that you have a zero recovery point objective. So if you actually lose one of those sites, one of the other ones has the correct data. It’s guaranteed to. And you’re recovery time objective also can be quite fast because instead of having an active and a secondary, or an active and a passive you have three actives and they’re all taking things. And every time one of the actives gets a request to write something, it just asks one of the other ones to concur. But all of them are doing that all of the time. So they’re all active.

One thing you would say to this panel of people who were looking to upgrade a storage strategy is when you’re storing your SQL database, try to have the database do as much as possible to solve the modern problem of reliable scalable databases, which includes everything that you just went over. But there are a bunch of other capabilities that are now part of the modern problem, such as geo-pinning data and also being able to enforce various policies underneath the database access layer, so that you don’t do it explicitly in the application layer. Give me a couple of examples of how your database service goes beyond just where you’re talking about in terms of scalability and reliability.

We call this the global business side of things and it wasn’t much of a concern ten years ago and in fact the databases reflected that. Databases were monolithic. They lived in one location. These days because markets are extremely international, in just the last ten years we’ve gone from a billion people connected to five billion. And in the ten years before that, we went from 100 million to a billion. The evolution of the size of markets that are connected with potential audience for companies has grown by orders of magnitude. It was always the case that inside a database, information was accessible and tagged with what’s called the primary key. It’s how you look up a piece of information, a piece of metadata. It might be someone’s email address or customer ID. More recent database designs included a timestamp as well because it was very clear that that was becoming more and more useful as there was more data and you wanted to look at things historically. What Cockroach fundamentally is doing is attaching a third piece of data or metadata to every piece of data in the database. So you’ve got the primary key, you’ve got the timestamp and now you also have the locality. Because fundamentally data belongs to typically a company or consumer. It’s attached to them. And that person or that company usually exists somewhere out there in our world. And where they exist is actually starting to become incredibly important for two reasons. One is because it’s not just the United States anymore that’s your fundamental concern or just in Europe because you’re a European company. Now, again, the markets are more global so people are everywhere. Latency has also become a pretty big problem. The speed of light is pretty fast, but the world is pretty big so a latency and speed through networks slows the speed of light down quite a bit. But if you’re an Australian user traditionally you really get the short straw because most services that are available through a mobile app that you download from iTunes is going to have a back end which is sitting in San Jose or is sitting in Virginia. So the Australian user experience is a very slow one compared to a United States or a European user. And even a European user’s experience getting that West Coast data center isn’t particularly good. Now there’s a third thing that’s just kind of exploded onto the world in the last decade, and that’s data sovereignty—so the regulations around data privacy and data localization. GDPR is the most visible of these but it’s not by any means the most draconian. China and Russia have probably the most draconian policies which specify that all data of residents of those countries must be domiciled only within those countries. So it’s a very strict regulation. The GDPR, for example, if you are going to store a European user’s data in the United States, you have to get their consent. But it’s still possible. The reality is it’s not just those three examples—Canada has a law, all the different South American countries are moving in the direction of the GDPR. Vietnam has a law that requires that one copy of the data is stored in Vietnam. There’s no consensus on how these things should be done and these things are popping up like mushrooms after a rainstorm.

The idea is that the database in some level has to be able to take that third aspect of metadata, the geo-location, and be able to do something intelligent with the data based on that. And that can be not again a property of the application but a property of the database. You can say anything tagged with Russian or Chinese has to stay in a copy of the database that’s in Russia or in China?

Right. What this global story opens up is a much wider perspective. And in that wider perspective you have, through the public cloud of course, data centers in the EU, data centers in the United States, data centers in China. Let’s just use those three examples. For a Chinese user’s data, again, as you say, you had tagged that data as being in China and you could set up a policy with Cockroach which would not geo-replicate the Chinese users’ data—one copy on China, one in the EU, and one in the United States. Instead what you would do is replicate all three copies of a Chinese user’s data in China—only in China. And that’s actually required by their laws. And it’s also what you want to do in terms of providing a great experience to a Chinese user. You want their rights to be able to get consensus among three data centers located in China. Similarly for an EU user, right? Now if you have a service, and it’s global, but you want to provide it for users in the EU, you’re actually going to be able to compete with a regional service because all of their data is going to be read and written within close proximity to the users. It’s going to be within their legal jurisdiction, you don’t have to warn them that you’re moving the data elsewhere. So this makes companies fundamentally more competitive. It’s a better UX, essentially.

You’ve now argued that if you use Cockroach there’s a variety of advantages and a variety of new capabilities like a high-quality database with new capabilities that handle the modern business world. How would you go about explaining how to create a strategy for data storage that allows companies to not get in trouble but also take advantage of these other capabilities?

That’s actually a really good point to bring up because as is necessary when you’re trying to do a podcast with limited time, there are always caveats that can be mentioned. One of those is it wasn’t just the complexity of implementing the guarantees of SQL that made NoSQL eschew those; there also is a flavor of NoSQL that just firmly embraces the elimination of SQL itself, mostly because SQL is kind of a pain in the neck. And this example that you’re giving, where you have customers where they don’t really have a well-defined schema, and there’s no way really to get it, it kind of happens on the fly, is a really good example where SQL breaks down. However, there are numerous ways to bridge that gap. In the case of Cockroach, we actually provide a capability of creating documents inside SQL tables. So in effect, you can have embedded documents stored inside your SQL table, inside your SQL database. In this particular case you might have a bunch of metadata that you do know about for this insurance company’s customers. So you know what is the name of the company, what is the contact information, where’s their location—all things that you know are going to be true across all your customers. But then they have your own information. It’s usually like a JSON object, which is potentially big but very open-ended document format that includes everything that they might want to capture from their company. And that would also just be a column in that table about that customer. The nice thing about doing it this way is that all of the changes in that document are still protected by transactions because it’s just another column in that table. You can do queries using SQL, which is very elegant and – query language that allows a lot of sophisticated declarative usage so you don’t have to be a programmer to ask the database questions about what it contains. Using that SQL you can do queries of the database that dig down into the document in arbitrary ways. Essentially, what I’m talking about is having your cake and eating it too. Now the SQL systems, and Cockroach isn’t the only one, are recognizing the value of some of that innovation and assimilating it into their models. So that’s a good example of where, with a SQL database, with additional capabilities, you’re actually able to do everything that you do with a document store, but you still have all the goodness of SQL and a transactional isolation guarantees.

I would assume that there are probably cases where the fit to the purpose is fine with a NoSQL database or a graph database, for example, to do certain types of queries that are much more difficult to do and to represent certain types of information. So the idea is that companies should define their tolerance for danger and risk, and then just allocate the databases wherever you need and allocate the data to whatever store provides the best fit.

Yes. Fundamentally there’s never going to be one database to rule them all and that is certainly not our goal. I think we’d probably end up not really providing a solution for anyone.