Jake Porway is a research fellow at data.org. He co-founded and served as Executive Director at DataKind, a non-profit dedicated to using data science in the service of humanity. This article is part of a series on demystifying data science and AI’s role in social impact.
“Having words for these forms makes the differences between them so much more obvious. With words at your disposal, you can see more clearly. Finding the words is another step in learning to see.” — Robin Wall Kimmerer, Gathering Moss: A Natural and Cultural History of Mosses
One of the biggest issues in effectively using “Data for Good” and “AI for Good” is in the vagueness of the terms themselves. They treat all efforts as similar by the mere fact that they use “data” and seek to do something “good”. These terms are so broad as to be practically meaningless; as unhelpful as saying “Wood for Good”. We would laugh at a term as vague as “Wood for Good”, which would lump together activities as different as building houses to burning wood in cook stoves to making paper, combining architecture with carpentry, forestry with fuel. However, we are content to say “Data for Good”, and its related phrases “we need to use our data better” or “we need to be data-driven”, when data is arguably even more general than something like wood.
But there is a way through the forest. If we talk less about data and more about the class of problems we are trying to solve with data, we can immediately start to see more clear lines about how it can enhance our work. What’s more, I contend that data and computing can only be used for THREE things – and there will only ever be these three things – and each of them comes with its own culture, methodologies, and problems it can solve. In this article, I’ll take you through each of the three outcomes, how they got to be “datafied” over time, and what we can do to apply them in our own work. In the following article on applying these three uses of data, we’ll talk about where each application is likely to succeed or fail.
Before we dive in, I want to acknowledge that there is no lack of articles on “what is data science?” or “what is AI?” However, I find that many of them explain the field in academic terms in relation to other fields, like this gem from Wikipedia:
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.
That definition gives us no practical sense of what problems we can solve with data science, and barely distinguishes it from other practices, even assuming you know what all the other field are. Other frameworks feel divorced from the human experience, describing the use of data as “inference”, “evaluation”, or “prediction”. In this article, I’m hoping to spell out what data and computing can do in layman’s terms that outlast any buzz that could come down the line years from now. It is an attempt to describe what data and computers do axiomatically, based on the core things that we as humans do in achieving our goals. And if that’s already been done then, well, as Samuel Johnson said, people “need more to be reminded than instructed.” Let the reminding begin!
The Only Three Things You Can Do With Data
Despite the plethora of terms like “data mining”, “artificial intelligence”, “predictive analytics”, and so on, there are, in fact, only three things you can do with data and computing. What’s more, there will only ever be three, so long as our universe keeps working the way it does. The three things are:
- Observe: Take a snapshot of the world. What does it look like today? What did it look like in the past?
- Reason: Draw conclusions about how the world works. How do things relate to one another? What might the world look like tomorrow?
- Act: Physically change the world. Take an action that moves the world into a new state.
That’s it. Really! If these words seem familiar, it’s because they have nothing to do with data and computing inherently, but are a simplification of the three things that we as humans do to achieve a goal. For every goal that we set, be it as trivial as getting Mexican food to eat tonight or as grand as saving the whales, we execute some version of these activities over and over. Take the example of getting Mexican food to eat by 530PM tonight. Here’s how those steps might unfold:
- Observe – what does the world look like right now? I have $5 in my wallet. It’s 5PM. There are three Mexican restaurants within 10 miles of my house.
- Reason – how do things relate, what might happen in the future? Well I only have $5 and my favorite dish at Panchito’s is $10 so that one’s out. In my experience, it’s really trafficky this time of day on the I-97, so I can’t make it to Acapulco by 530PM. That leaves Taco Hut, which I can get to by 530 and can afford with $5. To Taco Hut!
- Act  – physically change the world. You get your car keys, open the door, go down to your car, drive to Taco Hut, order tacos, mow down.
- Goal: Achieved! We ate Mexican food by 530PM tonight. Mission accomplished!
Of course this process is not so clean and linear in real life – you’ll constantly reassess your plan throughout and might loop from Reason back to Observe – but hopefully it’s relatable enough to explain the three types of activities we humans carry out in achieving our goals. When asking “how can I use data/AI?”, we can rest assured that the answer must relate to one or more of these three actions. Call it data mining, call it AI, call it ‘computer magic’, whatever you’re doing is in pursuit of Observing, Reasoning, or Acting faster or cheaper than you could on your own in order to reach your goals.
Augmenting Observe, Reason, and Act
As a human, you do all three of these things on your own: you use the five senses you’ve got to Observe the world, you use that big ol’ brain of yours to Reason, and you use your body or tools to Act in the world. What does it look like to augment these three actions with data and computing? It can be overwhelming to sort them out in today’s highly digital world, so let’s take a very brief stroll through history and look at the moments we started offloading Observe, Reason, and Act to the digital world.
Observation Goes Digital: As humans, we have observed our world, built tools to help us observe it, and kept track of our observations since the beginning of time. Early civilizations kept detailed records of grain inventories so that they could keep markets running. The Greeks had a count of all the people in their empire so that they knew how many men they could conscript to the army. One of the first systematized versions of widescale observation was the Census, which we still use today in most countries to understand how many people there are, how many resources a nation has, and other statistics about the country. In fact, the English word “statistics” comes from this use of mass observation for governing, literally translating to “the science of the state.” As computers came to the fore, we collected even more observations as digital information, from emails to GPS locations to selfies. Any tool that quantifies our world is helping us Observe more things, for better or for ill .
Reasoning Goes Digital: While it is alluring to believe that lots of data suddenly translates to lots of knowledge, it quickly became clear that all of the new data we could Observe was merely the first step in understanding our world. Human reasoning has its limitations, and we are subject to irrational biases and egos that cause us to misinterpret data, despite our best efforts. Up until the 1930s, reasoning about data was more of an art than a science: civil servants would try to conclude how many people would be born this year by extrapolating from previous years, and doctors would meld experience with practice to conclude which treatments were effective. Around 1930, a number of people like R.A. Fischer formalized the field of probability and statistics, creating mathematical methods that would allow us to more consistently determine which relationships in the data were strong. If you’ve heard phrases like “linear regression” or “randomized controlled trials”, they come from this discipline of mathematically modeling the relationships in data. The goal of these tools is to allow us to draw reliable, repeatable conclusions about the world, like that a drug is effective, without our biases getting in the way (as much).
Action Goes Digital: Action is the step that we as a species may have the most experience outsourcing. From the invention of the wheel up to the modern day use of cars, we have invented tools to make our tasks cheaper and faster. However, for most of human history these tools have existed in the physical world, requiring a human to animate them and apply them to our goals. Around the beginning of the 20th century, people began creating machines that operated on information. These “computing machines” started by automating tedious but straightforward tasks, like computing complicated mathematical equations for ballistic trajectories or tirelessly plugging away at cracking codes. In the 1950s, computers took their first steps toward self-improvement when Arthur Samuel set about training computers to play checkers against one another by learning from data about human checkers games. From then on, computers have been able to Act in increasingly sophisticated ways by learning how to imitate us using data.
We’ve talked about how we expanded our ability to Observe, Reason, and Act using digital data and computers, but the examples we walked through may not square with your experience of the world we live in today. Sure, we run a census to Observe our world, but that’s been going on for centuries. Statistical modeling helped us Reason more robustly, but we’re already familiar with the use of statistical tests in drug trials and scientific experiments. The example of Act is a computer playing checkers? We’ve had chess bots for the last three decades! How do we get to trillions of tweets, Hans Rosling TED talks, and self-driving cars?
Around 2007 there was a major shift in the world – the first iPhone was released. This moment in history kicked off what we dubbed the “Big Data” age. The iPhone wasn’t solely responsible for “Big Data” of course, nor was it the first smartphone to be released, but it’s a convenient symbol for a moment in time when we started moving many of our activities into the digital world, creating an unheard of amount of data in the process. We strapped ourselves with cameras, GPS, and infinitely available always-on digital apps, we made laptops and computers cheaper, we littered our seas and skies with satellites and sensors. People started saying “data is the new oil”, entirely new industries rose overnight, and we grappled with what it meant to be “data sapiens”. In the opening of this article, I mocked the idea of saying “wood for good”, but this is a moment to be fair to the “data for good” movement. If everyone suddenly woke up with a forest growing in the corner office of their business they might very well focus on what to do with ALL THAT WOOD. So too did data wash over every industry, every field, and every action we took.
You may think the statisticians and computer scientists had the best handle on this new age, yet many of them were freaking. OUT. Suddenly statisticians who would use datasets of 100 to 200 data points to Observe and Reason were confronted with millions of data points streaming off the internet. Computer scientists building machines to Act suddenly had more data to build increasingly complex computer programs from. It created technological and philosophical questions – what does it mean to run a mail-in census every ten years when a large part of your population can be texted every month? If a computer could learn from all the data on the Internet would it be “intelligent”? The philosophical questions are still open, but technologically it meant that statisticians had to start using computers to collect and sort data, and computer scientists had to start using probability and statistical modeling to train computers to do more complex tasks. In other words, all of Observe, Reason, and Act suddenly needed both data and computers to be done well. That’s where all the confusion began.
A few people sought terms to describe this new era in which you needed to use computers and data to do interesting things. The business world won out, and “data science” led the way as a catch-all for all things digitally data-related, inviting no small amount of snark:
Other terms sprang up to parse this moment – “data analyst”, “data engineer”, “information scientist”. As computers became capable of more anthropomorphic tasks, like speaking to you as a voice assistant or opening the door for you, the term “artificial intelligence” – first coined in the 1950s – became part of the public parlance. Today it’s a safe bet that any news article discussing something a computer has done will use “AI”. Will that be true when you read this article?
So What DO These Terms Mean?
So now, looking back at the last 10 years or so, what are we to make of the Big Data age, the AI revolution, and this world we live in today? We’ve talked about how we expanded our abilities to Observer, Reason, and Act, as well as how the “Big Data” age supercharged all of that. But so then what is data science now? What is AI?
Sadly, the bottom line is that there aren’t clear definitions for them. Depending on who you talk to, in which discipline, in which year, they may use those terms totally differently. When I was coming out of grad school in 2010, few companies knew what to do with an AI grad like me, but they did know they wanted “data scientists”, so that’s what I put on my resume. Nowadays, data scientists are passe, and big companies want AI experts. In my case, it’s the same skillset, but now known by a different name. Here’s my editorial view of how many of these popular terms have been applied to Observe, Reason, and Act over time.
These terms will no doubt change as new technologies develop, but hopefully with this framing we can keep our heads about us when it does. We don’t need to add a “Quantum Observation for Good” conference to our list of obligations.
Great. So Now What?
If you’re feeling dismayed by the lack of clarity on “data science” and “AI”, fret not: What I hope that you take away from this article is that these jargony terms may be unclear or may change a million times over, but they will always be referring to one of these three, unchangeable outcomes. We humans will always want to achieve our goals and, so long as that is true, we’ll be using the axiomatic steps of Observe, Reason, and Act. Whether we enlist a computer and data for help or stick to our built-in sensors to get through each phase is all that differs between every story you hear about “AI” or “data science”. If you want to delve into further examples and understand better when using data for Observe, Reason, and Act succeeds or fails in the real world, keep your eyes peeled for an upcoming article on practically applying this knowledge. In the meantime, go forth knowing that no technobabble can outsmart or intimidate you. Technology isn’t doing anything magical and data isn’t “all-knowing”. It’s always in service of what we already do, and you know how to do that just fine.
1. One of the existing frameworks for reasoning with data that is popular is “Descriptive, Predictive, Prescriptive”. It’s very similar to Observe-Reason-Act, but I think that Prescriptive stops just short of the important phase of actually affecting something in the world. A Roomba vacuuming my floor or Facebook rearranging my news feed is much more active than mere prescription. We’ll see later how important that is when talking about computers taking actions on our behalf.
2. We would be remiss if we didn’t acknowledge that often the people and institutions with the power to collect data have used it to further entrench power, subjugate others, or otherwise use data for their own evil ends. Collecting data to observe our world has led to some of our greatest scientific discoveries, like navigating the seas by the stars, but it has also fueled some of our most shameful atrocities, like tracking Africans who were enslaved by Americans or monitoring and dehumanizing Jews and ethnic minorities in Nazi Germany to manage their extermination. Data is never neutral, for it is always in our hands.