Tom Morgan was a smart guy. He was a plainspoken partner at Price Waterhouse who influenced me in many ways. For instance, he once (better than 30 years ago) told me that (both ours and our clients) actions should be driven by “facts and logic”. This made good sense to me and I have spent the intervening years refining my understanding of that simple thought.
Tom’s philosophy (and I am doing a little educated guessing here) was born out of his background building systems that processed large (for the time) volumes of data to support complex operational decision-making. The “facts” were the data input into the program and the logic describes the procedural programming languages that processed the input data and drove decision-making. The systems were deterministic, i.e. for given inputs, you could count on a specific output. Facts and logic.
Things have not changed since Tom left me with his thought about facts and logic—except maybe for the scale of the data used and the technology available to process it😊
- The volume of data collected and available to decision-makers has exploded. In the early 1980s when I started my consulting career, big data was denominated in megabytes (10242 bytes, i.e. 1,048,576 bytes). Today big data is measured in Zettabytes (10247) or Yottabytes (10248). And a Yottabyte is 1,208,925,819,614,629,174,706,176 bytes. By the way, a byte is generally 8 bits and corresponds to a character, symbol or numerical digit.
- Processing power has increased too. In 1983 a Cray supercomputer was capable of 713.00 MFLOPS (106 floating point operations/second). In 2022 its descendant, a HPE Cray was capable of 1.102 EFLOPS (1018 floating point operations).
- Technologies (database, data warehousing, analytics, AI, etc.) are thriving on the exponential growth in computing power and data capacity.
These changes have made data even more valuable. This post is about one very important discipline associated with exploiting data, Data Science. Data science is the study of extracting value from data:
—value in the form of insights and conclusions.
—value in the form of insights and beneficial outcomes.
—value in the form of insights and actionable conclusions.
—value in the form of realizations and actionable conclusions.
—value in the form of realizations and beneficial outcomes.
The bolded text is a quote out of the book (Data Science in Context: Foundations, Challenges, Opportunities) that motivated me to write this post. The other four iterations were my attempt to improve upon it. I kind of like the last iteration, but it is at best a marginal improvement.
That said, I found the book, as a whole, to be exceptional. And that one sentence inspired an aha movement. It basically summarized what I spent the better part of my career doing: taking data and using that data as a foundation upon which good decision can be made. The data I refer to is the output generated in the course of large scale transaction processing for both clients and one notable company that I worked for.
Whether you generate it yourself (like Amazon) or capture it providing services, like Google or Facebook, data provides the granular input to the DIKW model that explains data’s role in intelligent decision making. The origins of the DIKW pyramid go back a long way, to the 1920s when Clarence Barron is credited with speaking about the “Knowledge, Intelligence and Wisdom” hierarchy.
DIKW Pyramid (Wikipedia: https://en.wikipedia.org/wiki/DIKW_pyramid )
- Data—facts, i.e. that which is known (or firmly believed) to be real or true; what has actually happened or is the case; truth attested by direct observation or inference. Data is the documentation of the facts as captured in numbers, text, icons etc.
- Information—data in context of a problem. By providing context, e.g. number and kinds of car accidents at an intersection, you identify and provide the structure against which the data can have meaning.
- Knowledge—information that is carefully vetted/considered to ensure that it is useful/actionable for the purpose at hand. Knowledge (the frequency and kinds of car accidents) can be applied to decision-making— with other inputs, like the kinds of remedies that are effective given what has been discovered about the dynamics of the intersection based on the information available about accidents.
- Wisdom—knowledge applied to generate good (or to a lower bar, desired) outcomes. In the example I have been using this would be the implementation of appropriate, effective methods designed to mitigate the issue.
I would note that my description of the components of the DIKW pyramid are a bit unorthodox but have provided a solid conceptual structure for me as a practitioner. A more formal definition and references to refereed sources can be found in Wikipedia— https://en.wikipedia.org/wiki/DIKW_pyramid
I could end here, but that clearly isn’t me. I am going to describe how data gets transformed into wisdom. Let’s start by assuming that one can collect payment data and put it into a database. And, that data is related to the payment lifecycle of an industry that is important to the U.S. economy, i.e. with this context it is now information.
The database contains both history and data that is entered in anticipation of work that will begin in the next three to six months. Furthermore, this data can be analyzed in the context of U.S. economic data. And, it turns out that the data we have been discussing is highly correlated to the performance of the economy as a whole. This means that we have “wisdom” that can be applied to making sound (and potentially profitable) predictions about future economic performance. Why? Because we have data—the information stored in our database by contractors who are preparing to do work over the next three to six months. And that data is expected to be correlated to the performance of the U.S. economy as a whole.
Having a reliable three-to-six-month prognostication as to the health of the economy could be quite valuable to businesses (banks, manufactures or service providers) that are making their plans for ongoing operations and investments for future growth.
By the way, the scenario I sketched out above is not a hypothetical. I provided anonymized payment data to the Chicago Federal Reserve Bank and their economists, using pretty standard econometric tools, analyzed what I gave them and produced the paper at the other end of this link: https://www.chicagofed.org/publications/chicago-fed-letter/2016/366
So, what are your take-aways?
- If you are in business and that business collects data as part your normal business processing, you should take a careful look at it. It might be valuable. And, it might provide insights that will both improve your legacy business operations; and suggest new highly profitable revenue streams.
- You don’t have to be Google or Amazon to do this. I had neither the expertise (in econometrics) nor the tools (statistical software) to analyze the Textura data, so I outsourced it. This worked out well for me (I got confirmation of my hypothesis that Textura payment data could be used to predict economic activity) and for the Chicago Fed (they got a research paper out of it).
- You don’t necessarily need a lot of data. It just needs to be the “right” data. The analysis I referred to above used a database that was just several hundred megabytes.
- While AI is getting all of the press right now, there is gold in data (other than large language models) out there just waiting to be mined.
Data science is a broad field. In this post I focused on a narrow slice of the discipline. A slice that is both well tested and understood. And like Tom Morgan, the techniques I used are not flashy, but an investment in it can provide an outsized return.
Copyright 2023 Howard Niden
— you can find this (days earlier) and other posts at www.niden.com.
And, if you like this post: 1) please let me know; and 2) pass on your “find” to others.