Do You Have a 10^9 Data Problem?

What is a “10^9 data problem”? That’s a simple question with a complicated answer. Let’s dive into what it means to have a 10^9 problem.

One billion (10^9) data points. That’s the scale at which almost all modern analytics and visualization technologies on the market today fall over. And by fall over, I mean they either time-out, crash, or become so slow that they are practically unusable in any critical real-time decision support processes.

Now, it’s important that I’m specific about what I’m talking about. There are a myriad of technologies available to you today to store, query, and process data volumes at or above the scale of one billion rows (ignoring width for a moment). You only have a problem if you can’t access that data in a way to make effective, data-driven decisions about how to run your business and meet your strategic goals.

It might help if I start by giving you examples of what it is to not have a 10^9 problem:

  1. You have a massive heap of historical data for which time to insight isn’t critical
  2. Whether you run a 12 hour query, or retrieve an answer in milliseconds won’t affect the way you make decisions
  3. You look at the data infrequently
  4. You struggle to define why you are storing (and paying for) the data in the first place (you probably need some outside help if this is the case)
  5. Details aren’t that important - meaning, you can aggregate data into KPI’s or summary metrics that describe historical performance and report on them every month or so
  6. You’re in a very slow moving business or one with very little competitive pressure, such that time (i.e., speed) doesn’t help you drive a comparative advantage
  7. Your business or your function isn’t mission critical (i.e., you can walk away for a few weeks and nothing will really happen)

If any of these points apply to you, you likely don’t have a 10^9 problem. If not, read on.

10^9 Problems are those problems that prevent you from using the data assets you have to your advantage, either because you can’t process them fast enough to make decisions, you can’t create outputs from them that a human can read and interpret, or you can’t iterate fast enough to fail fast and fail often so that you can figure out what works (this applies to many data science workflows).

Solving your 10^9 problems will have massive impact.

  • If you’re a telecommunications carrier you can quickly spot network anomalies and resolve them, improving customer satisfaction and reducing churn
  • If you’re a retailer, wholesaler, or involved in the distribution of tangible goods in any way, you can immediately know who is driving demand, where and when across your entire operation at once
  • If you’re in logistics, you can track the movement of goods and their condition across your supply chain (visually) to optimize inventory levels, spot problems and reduce your carry
  • If you’re in consumer marketing, you can see who passes by your physical storefront, where they came from, where they went, and whether or not they chose a competitor over you to inform marketing and advertising investment decisions (or store placement)
  • If you’re in eCommerce, you can materialize massive amounts of transaction history to tailor online buyer journeys for specific segments of your population (personas)
  • If you’re in merchandising, you can perform fast a/b testing across all of your markets to optimize placement and promotion according to any number of demographic dimensions
  • If you’re in oil and gas, you can monitor asset performance across your entire portfolio at once, rather than segmenting by region, basin, or lease, transferring learning from one region to another to drive down the cost of future exploration

This list could go on forever. I hope I’ve made my point.

For any of the applications above, speed to insight is critical to inform decision making. Likewise, any of these decision processes can be accelerated or completely automated with the use of smart algorithms that generate recommendations that you can trust and (now) verify.

This is the road to digital transformation.

Ask yourself, do you have a 10^9 data problem?

Photo by Joshua Sortino on Unsplash