How to Find and Use External Data

The first draft of a program for finding and making use of external data to improve business results.

In the Forbes article “Do You Suffer From the ‘Data Not Invented Here’ Syndrome”, Dan Woods proposed that the data created and maintained outside of one’s company is becoming more important than the data that can be acquired from internal sources.

The purpose of this article is to explore how CIOs and CTOs can find valuable sources for data outside the four walls of their enterprises and how to put that data to good use.


In previous articles, such as “Creating a Vascular System for External Big Data” and “External Data Opens a Disruptive Frontier,” we’ve made the argument that the biggest benefits will accrue when you can determine who has the data you need and how you can partner with them to get access to that data. In this article, we’ll look more closely at approaches to finding and using external data.

One central hypothesis of our work so far is that acquiring access to external data will be a vital business development activity in the very near future. A few reasonable first steps toward the optimal use of external data include:

  • Understand what would be valuable to know. In our Data Not Invented Here article, we suggested that companies play “The Question Game.” This exercise arms them with an understanding of what signals would be valuable by seeing what questions need to be answered (and how fast).
  • Seek signals from easy-to-access external datasets. There are lots of data purveyors like Infochimps and Factual who can provide access to new types of datasets. While these companies know what the data can provide in a general sense, they don’t know what signals it can provide that will be meaningful for your business. Start experimenting.
  • Hunt down external data. Once you begin to understand how to create signals out of noisy, dirty, incomplete, external data, you will then know where nuggets of valuable information are found in other companies. The biggest victories will come when you figure out who has the data you need and partner with them to get proprietary access to it.
  • Offer up your own data. A greater understanding of how to gain value from external data will lead to a recognition of the value of the data you possess. Don’t be shy. Create an API and offer it up in exchange for money or some other sort of fair trade.

Some basic questions you will need to ask include:

  • How to get access to the data?
  • How to determine whether it is useful to you?
  • What data is worth paying for and what is easily obtained for free?
  • What are the advantages or disadvantages to using the same vendor for data access and analysis?
  • How can opening up APIs for bidirectional data exchange benefit your company?
  • What kinds of cleaning or optimizing will need to be performed so that the data will be usable?
  • What kinds of organizational changes might have to take place in a company’s analytical unit to make the most of external big data?
  • What kinds of changes might have to happen to established business intelligence and analytical tools?

Initial Thoughts

Let’s begin by taking a look at three basic buckets of external data available to us now, as illustrated by AgilOne’s founder and CEO Omer Artun in the article “How Machine Learning Can Restore Customer Intimacy.”

  • Second-party data is owned by the enterprise but managed by a second party, such as a cloud provider or an email system. Second-party data can be analyzed through platforms such as AgilOne’s marketing optimization system.
  • Third-party data consists of packaged data designed for corporate consumption, such as Experian and Dun & Bradstreet credit reports, Acxiom Data demographic information, and the like. It could also include information such as FedEx shipping manifests. Many social media platforms, while forbidden from exposing personally identifiable information to marketers, do make packages of summary data available that can highlight important trends. According to Booz & Co., this may be the largest area of opportunity for enterprises to gather consumer insights. Optimal analytical tools for this set include Alteryx, QlikView, and Tableau. Some tools, such as Factual, also bundle and package this data for their customers’ consumption, as well as performing analyses.
  • Public domain data is publicly available information that is relevant to the business context, such as real estate records and census data. For example, revenue might be increasing 3% in a given geography, but if the geography’s GDP is increasing 10%, that’s an indication that the company’s performance is lagging behind its potential. Key analytical and source platforms for public domain data include Factual, InfoChimps, and Socrata.

Changes to the Acquisition Process

Based on our initial analysis, we can assert that some critical changes will happen to the way enterprises now acquire and analyze data.

“Cloud-sourcing” will become a watchword. Enterprises will continue to make further use of the cloud as a path for obtaining data from multiple, external sources. This is especially true as social media become an increasingly rich source of information about often-evasive factors, such as consumer sentiment, which may manifest itself in many ways that transactional data won’t communicate. This is also a reference to the fact that many data services are delivered as software-as-a-service (SaaS) apps from the cloud, releasing enterprises from the need to move the data.

Acquisition of data in general and gaining exclusive access to it will become a form of business development to gain competitive advantage. The concept is deceptively simple: the more access you have to data about your prospective customers, and the strategies of your competitors, the better your competitive advantage. Add exclusive data access to the mix, and you’ve really got something. Some industries, such as financial services, have always relied on information asymmetry to conduct arbitrage trades to their advantage. In the big data era, exclusive external data access will become a strong weapon in the arsenal of all kinds of businesses.

New methods for finding data and accessing external data will have to be developed. Data marketplaces are a start. Factual’s data platform solves several aspects of the problem by focusing on aggregating many datasets and making them useful.

In many cases, external data will not be moved but will be accessed remotely through APIs. Because of the massive amounts of data that are becoming available, it will be impossible to move all relevant data to repositories inside the four walls of a business. Instead, companies will need to create new types of analytical and transport systems that harvest data and insights, wherever it resides. In order to access this data, many companies will find the best way to obtain it is to trade with others. They will publish their own data using their APIs and acquire data from other companies. To do this, there will have to be some kind of mutual agreement on value, and it may not involve money changing hands. One added benefit would come from understanding how others use the data you provide.

APIs will need to be supplemented by intelligent event recognizers so that important signals can be gleaned from remote repositories. It’s quite possible that the best information about how you can optimize your services can be gleaned from how your customers use your API or complementary or competing services. To do this, you’ll need technology that can recognize “strong” (local and confirmed) and “weak” (indirect) signals alike, and intelligently correlate them.

Changes to the Analytical Process

Here are the types of changes to the analytical process that will need to be made.

Enterprise data warehouses and analytical structures were built to analyze structured data sourced from within or under the full control of the enterprise. These technologies will need to change in order to accommodate unstructured data from multiple external providers.

Data will be made available to more users in a more intuitive fashion. It’s pointless to collect enormous volumes of external data unless it can be cross-referenced with private data and made available to the largest possible group of users. No one wants a new data-science bottleneck to replace the old business-intelligence bottlenecks. Many of the vendors mentioned in this mission, including QlikView and Tableau, consider such access to be part of their core value propositions.

The Mission

The object of this mission is to identify as many value-creation scenarios as possible through obtaining and exploring external data. Then we want to examine potential challenges that might intercede and organizational changes that may need to happen in order to truly capitalize on that value.

Let us know if you have suggestions or thoughts that will help us complete this mission.