You hear the term “big data” an awful lot these days and it is difficult to understand what the term means. I am going to devote this post to explaining who the players are and how they participate (what they contribute and what they get) in the big data ecosystem with the objective of helping to fill in some of the puzzle pieces and make big data a little more understandable.
I will preface outlining the players by noting that I am defining roles here and that it is possible for individuals or organizations to take on more than one of these roles. That said, my experience indicates that when individuals or organizations believe that they are qualified (training and experience) to take on more than one role, they are mistaken.
There are 7 players in the big data eco-system:
- Producers of Data—producers of data are generally people who collect or generate data as a normal part of their business. Nearly all the organizations that I have worked with throughout my carrier have either collected or generated (potentially valuable) data as a routine part of their business. Two have been notable in their decision to use the data. In both cases the organizations collected large quantities of information in the normal course of doing business and in both cases, they decided to use that information to:
- Improve their ability to directly support their customers. In one case one of my clients collected telemetry from their product (as it runs in the normal course of business) and can predict the need for maintenance and identify failures before they impact the operation of the product; and
- Use the data to produce new information driven products that are targeted at their current customers and entirely new customers. One of the organizations I worked with planned (I left the company before this was implemented) to use data collected in the normal course of business to allow its customers to benchmark their performance against their peers.
Producers of data should expect to be able (whether they use the data themselves or make it available to others) to extract rents which reflect the value that the data brings to the table.
- Domain Experts—these are people with expertise about the data that is being produced. Another term might be industry experts, but sometimes that subject data is so focused and specialized, that “industry” is much too broad a term to describe the coverage of their expertise. These are the folks that understand the data being collected well enough (in the context of the industry or particular, narrowly focused use cases) to propose valuable (to its owners or other customers who might benefit from its predictive power) uses for the data.
Domain experts are generally people who are involved (generally with a business not technology focus) in the normal course of business that generates the data and can make themselves more valuable to the company by contributing the expertise that they have developed as business-people to the profitable use of the data in business contexts that are both directly related to the generation of the data and other sometimes even more valuable orthogonally related uses for the data.
- Consumers of Data—these are the organizations who have and both an understanding of the data (domain experts) and the tools and techniques required analyze the data and draw conclusions from it. It has been my (very frustrating experience) that this group of players does not always have a full enough understanding of the tools and techniques that are currently available to fully exploit the data at hand. In the situations that make the most impact on me, these folks are using the tools and techniques that they learned in graduate school many years ago, and have not been exposed to newer approaches for dealing with large data sets.
A good example are folks who use statistical techniques, but are not familiar with techniques (or the associated tools) that provide powerful pattern matching functionality that facilitates recognition of patterns that provide a different view into data sets and the ability to gain additional insights and extract even more value.
- Consumers of Analytics—these are the folks who buy packaged analytics in the form of reports or online repositories that present bundles of off-the-shelf digests of information that have been prepared for a (in relative terms) mass audience. Many government agencies package and deliver analysis based on the data they collect. Economic data provides the prototypical example of this type of analysis and it can be valuable if consumed intelligently and in a timely manner.
- Technologists—these are the folks who pull together the underlying technology (servers, databases, tools, etc.) necessary to collect, analyze, bundle and present the results of analysis against big data repositories. They provide and maintain the basic infrastructure and platform necessary to do the analysis. Their job has become easier at cloud-based platforms have become available, but their services (to ensure performance, radiality and security) is still (especially if data is going to become a business) very much necessary.
- Data Scientists—these are the folks with the expertise regarding the tools and techniques that are required to do analytics. In many cases this class of resources gets underutilized (data consumers want to do their own thing) or stretched beyond their expertise when they are asked to play the role of both data scientist and domain expert.
Data scientists are consultants to the generators and consumers of data. They provide an understanding of the tools and techniques that are available and the expertise necessary to fully exploit the data.
- Tool Vendors—these are the organizations that produce tools to collect, organize, store and analyze data. These folks bring the power of automation to the techniques to the craft and science of data analytics. They sell the “machines” (both hardware and software) that do the heavy lifting associated with collecting, storing, organizing and most importantly (in terms of adding value) analyzing data.
I understand that there are many issues regarding big data that I have not addressed here. They include:
- The ethical uses of data that is collected, especially if the data is going to be used in ways that weren’t originally contemplated by the people collecting the data;
- The reasons that teams with the right players (with sufficient expertise in the area competence) aren’t often enough coming together to squeeze the maximum value out of big data opportunities;
- What is “big data” and how does it differ from other kinds of “data”; and
- How to decide if you have big data that holds sufficient value to merit an investment in products around that data.
Depending on the response to this post, I will consider a post exploring one or more of those areas of interest, so please provide feedback.
That said, I hope that this explanation of the players/roles involved in the generation and exploitation of big data is useful and provides another view into what big data is and what it takes to extract the value that it holds.
Copyright 2017 Howard Niden