Research Note: Actian’s Strategy for Distributed Big Data

Actian CEO Steve Shine drives company growth to meet big data challenges.

Under the leadership of CEO Steve Shine, Actian has become an aggressive acquirer of technology for data management, processing, analysis, and application development.

Actian Acquires Versant, Pervasive Software and ParAccel

In 2008 Actian bought the core code that became the Vectorwise high-performance database. In the past five months, Actian has purchased Versant, Pervasive Software, and ParAccel. While it is not unusual for a company to acquire others, in Actian’s case it is interesting to examine the logic of the deals.

Actian is betting that a more heterogeneous and distributed supply chain paradigm will replace the current infrastructure for business intelligence and advanced analytics (including big data), which is based on a centralized data warehouse model. In addition, Actian is betting that more and more applications will need highly scalable databases, with scalable pipelines to connect them, optimized for specific analytics and applications development.

Actian Focuses on Distributed Architecture

Unlike most companies in this space, Actian is focusing on a long-term problem that will still be with us when the term big data has gone out of use and Hadoop has become an embedded system that is rarely mentioned. That problem is the challenge of creating a radically distributed architecture for data processing and analysis.

Urgent Problem: Data Sources Grow and Grow

This problem is going to become urgent as the number and size of data sources grows as I pointed out in my columns on Forbes.com (How External Data Opens Up a New Frontier for Business Intelligence) and in the series on the “Data Not Invented Here” syndrome (Do You Suffer From the Data Not Invented Here Syndrome?, Creating a Vascular System for External Big Data, How to Find and Use External Data).

Actian’s Roots

The roots of Actian go back to Ingres Corp, which was founded in 1980 and created the Ingres RDBMS database. Ingres was acquired by CA Technologies, which continued to drive product development investment to support its many other applications but chose not to market the Ingres database standalone. In late 2005 the Ingres business was spun out of CA to Garnett & Helfrich (Terry Garnett being the ex-CMO of Oracle). The company changed its name from Ingres to Actian in 2011 as it realized it was set out on a path that would expand beyond the very strong Ingres product identity (the Ingres RDBMS still retains the Ingres name).

Steve Shine Grows Company

Shine arrived in 2007 and realized that to grow, the company would have to add new products to its core offering of the Ingres SQL database. The database business, Shine said, is intrinsically embedded within a customer’s business operations and obtaining new customers is hard. But once you win them over, they will stay with you for a long time provided that you deliver first class support and service along with ongoing future innovations. Ingres has a strong business based on replacing more expensive databases from Oracle and other vendors. Shine saw that Ingres would be a solid business but one positioned in a market of modest single-digit growth. To grow faster, Shine started looking to add to Actian’s product portfolio through a mix of development and acquisitions.

Forces Reshaping the Market for Business Intelligence, Advanced Analytics, and Application Development

To provide context for what’s been happening at Actian, here are some of the key forces that are pushing more and more companies toward creating a supply chain for data to enhance their current data warehouse implementations.

Big Data Doesn’t Like To Move

The innovation of Hadoop clusters and most other ways to process big data like Splunk is to bring the algorithm to the data rather than move the data to the algorithm. In the world that is forming now, whether inside a company or not, it won’t make sense to have many copies of the same big data set. This means that workloads for that data set will likely come to the data rather than the data being copied in full. This means the ability to package up workloads, send them to the data, and then receive results will be important.

Number of Data Sources Available Both Internally and Externally Will Skyrocket

The rate of the instrumentation of devices and servers in addition to the vast increase in the number of applications we all use means that we will have exponentially more data sources at our disposal. Remember, data is an asset, but also a liability, which must be maintained. The most valuable data sets will become products like Datasift and GNIP. Some of this data will be copied, but much of it will stay in one place and be accessed via remote queries or requests for analytics to be performed locally. This, in turn, means that the…

Era of the Single Data Warehouse Has Come To an End

Business intelligence is no longer just about one thing. The historical center of the action, cleaning and consolidating information from enterprise applications and then presenting that data for analysis using OLAP, has given way to dozens of new ways of using information. In addition, data that is distilled and analyzed is being used in applications, not just for analysis. Gartner has it right when it talks about the concept of the Logical Data Warehouse, but even that concept still shows a longing for a single entity.

It is time to give up on that dream and instead use the metaphor of a data supply chain supporting multiple sources and multiple destinations with data being processed continuously across locations.

Applications and Analytics Need Scalable Databases and Data Movement

As more and more companies start interacting directly with large numbers of customers via websites and mobile applications, the kinds of scalability problems that only the top players have seen will become commonplace. In addition, more scalable databases and data pipelines will be required as data volumes grow and the need to analyze that data and operationalize the insights in real time increases. Finally, the ability to supercharge existing enterprise applications and analytics has dramatic potential to accelerate business processes and improve results.

The implications of these forces are going to lead to an architecture for managing, cleaning, and analyzing data that will look much more like a complex supply chain than a hub and spoke system for data.

The Actian Context

Now let’s look at Actian and see how its recent purchases fit in with the idea of a radically distributed supply chain in mind.

Ingres, Actian’s Founding Technology: An Enterprise Class Database

Actian’s foundational technology is the Ingres database, which was developed originally at Berkeley and is in use at over 13,000 customers. For companies tired of paying high prices for Oracle, Sybase, or other commercial databases, Ingres offers a fully functional and professionally-supported alternative.

Vectorwise, Actian’s First Acquisition: A High-Performance In-Memory Database

Actian purchased Vectorwise, an in-memory database, in 2010 and launched it the same year. Vectorwise provides a SQL database that delivers extremely high performance optimized for business tools (such as BI suites and custom reporting applications), which benefit from a simpler (read “easier to manage”) SMP scale-up architecture.

Vectorwise holds records in several leading benchmarks. Vectorwise is based on research into databases that had as its goal rethinking how to use all the power of modern X86 chip architectures to overcome the bottleneck of the modern computing environment, memory access. Previous generations of database technology were designed to overcome bottlenecks that are no longer relevant. Vectorwise supports the distributed supply chain paradigm for data processing by providing super fast processing for analytics or applications.

Versant, Actian’s Second Acquisition: NoSQL, Object-Oriented Database

Actian acquired Versant, one of the leading object-oriented database companies, in December 2012. Versant is a NoSQL database that provides the ability to support very complex data models and associated applications. The ability to represent data as objects streamline data retrieval and application development. Schema changes are easier and more complicated and sparse data structures can be easily represented.

Versant supports the distributed supply chain paradigm by providing massive scalability of extremely complex data structures for applications that may also operationalize big data or incorporate the results of analytics.

Pervasive Software, Actian’s Third Acquisition: Integration/ETL, DataCloud and Big Data and Analytics Processing Power

Pervasive adds integration/ETL and big data processing muscle to Actian’s portfolio. Pervasive’s data integration capabilities, now called Actian Data Integrator, connect to almost every widely deployed system that stores data. This capability can be used to move and transform data in transit within data centers, in the cloud, or both. The Actian DataCloud is a secure on-demand services platform that enables users to create on-demand data and application integration, data quality, and analytics.

Pervasive also has a highly parallel system for processing and analyzing data, now called Actian RushAnalytics, based on the DataRush highly parallel execution engine. RushAnalytics allows complicated transformations and analysis to be expressed using an easy point-and-click visual modeling interface.

These transformations and analytics can then be deployed on massively parallel computers, both SMP and MPP. DataRush also has native integration with Hadoop that allows data transformations and analytics in a Hadoop job to be expressed and executed using high-performing DataRush instead of MapReduce.

Another part of the acquisition was Pervasive’s embedded PSQL database, which, like Ingres, supports traditional operational business workloads for thousands of companies worldwide.

ParAccel, Actian’s Fourth Acquisition: Enterprise-class MPP Database

ParAccel, Actian’s latest acquisition, is the MPP SQL database that is the foundation of Amazon’s RedShift product. ParAccel gives Actian the ability to provide a super scalable SQL-based database that can scale out on massive grids of commodity computers. MPP SQL technology is often used as the next step when a data warehouse grows too large for a traditional SQL database.

ParAccel provides Actian with the ability to offer a solution for those customers who are dedicated to using SQL but are seeking scalability. This capability enhances Actian’s ability to support big data on-premise or at a distributed location.

Recommendations

Steve Shine seems to be assembling all the components needed to support a distributed data fabric that can support the coming world of radically distributed and constantly flowing data. The only thing missing is a solution for creating front-end apps and doing simple data analysis.

Integrate with Analysis Technology

Spending time making sure that the company has excellent integrations with analytical technologies such as Alteryx, QlikView, Tableau, Tibco Spotfire, and other such technology would meet most of that need. Actian is focusing on its strengths in delivering the data management and analytics to support these BI/visualization partners. Actian has taken some steps in this space with its Action Apps, but these are more about server-side processing to create triggers and recognize events than creating user-focused apps.

CIOs and CTOs who are innovators, early adopters, or early or late majority can use any of Actian’s products to solve specific problems. Right now, CIOs and CTOs in the early adopter category could assemble a distributed data framework.

But for Actian’s portfolio to be more than the sum of its parts, all of the components listed will have to be combined into platforms or applications to solve specific problems.

Here are two suggestions:

  • Data monetization platform: Create a platform that will assemble data from all over an organization using Actian Data Integrator, refine it with pipelines, then allow customers to define their own analysis programs using the power of SQL on top of Actian ParAccel or Actian RushAnalytics.
  • Data aggregation platform: Create a platform that allows a company to assemble data from a wide range of partners using Actian Data Integrator and further process and enhance that data with Actian Rush Analytics. You could solve the Data Not Invented Here problem and get access to data that none of your competitors are using.

The more productized these platforms are, the better. They would likely have to be able to be deployed in a hybrid fashion, with some components in the cloud and others on-premise.

Of course, if these platforms succeed, they can easily be verticalized or turned into services.

If Actian moves in this direction, it can leap ahead of many of the other competitors that have a component mindset and instead start assembling the next generation of high-order data analysis products.