A Blueprint for A.I. Products
The key to a successful A.I. startup isn’t sophisticated or proprietary technologies, but the part of data science that we don’t spend nearly enough time thinking about: the data.
- September 26, 2022
- Artificial Intelligence
When Sonny Tai, ’15, and Ben Ziomek, ’19, set out to build an A.I. startup focused on preventing mass shootings in public locations such as schools and workplaces, they didn’t have a complicated or revolutionary new algorithm. And yet, since its launch in early 2018, their New York–based startup Actuate has raised $10 million including an oversubscribed Series A funding round and amassed over 1,000 clients, among them the US Army and Puerto Rican public schools.
Tai’s passion for gun safety came from growing up in South Africa, where shootings are unfortunately prevalent. The idea for Actuate came about after a mass shooting in 2017, when Tai, a former US Marine captain, felt motivated to explore technology that could help prevent future violence and save lives.
For his part, Ziomek says, “I grew up as the son of diplomats and very aware of US violence issues.” He explains, “We were interested in new types of security technologies, and believed every existing security camera can make the world a safer place without sacrificing privacy.”
At first glance, A.I. is not an obvious place to look for a solution to mass shootings. But its ability to automate repetitive tasks got the Actuate team thinking: Could they leverage A.I. to build an efficient technological safety solution and deploy at scale?
Tai, Ziomek, and their Actuate team came to understand that computer vision could be used to automatically detect weapons in real-time security feeds before potential attackers could do harm. They spent several years developing a system that can, in less than half a second, detect 99 percent of weapons.
How Tai and Ziomek built Actuate provides a lesson in what is needed to effectively apply A.I. Their story illustrates that what distinguishes successful A.I. ventures is not necessarily the quality of their A.I. technology or their domain expertise, but rather their ability to marshall the right types of data.
From our collective decades of experience working in the area of data science—including building real-world A.I. applications in the private, public, and nonprofit sectors, and teaching a course on this at Chicago Booth—we’ve come up with three principles that are key to successfully leveraging data to build A.I. products.
Principle No. 1: Do you know what you’re predicting? What do you want to predict?
In almost every data science course, students are told to assume a dataset with outcomes they want to predict and inputs they can use to predict those outcomes. But the success of an A.I. product hinges on determining what the outcomes should be in the first place. Two common missteps include defining an outcome so vaguely as to defy useful measurement, or too narrowly as to have inadequate data to support prediction.
The Actuate team couldn’t predict something as vague as “suspicious behavior”—there isn’t any dataset that defines what behavior qualifies as “suspicious.” But even if the team wanted to predict something more specific such as “mass shootings,” there still isn’t enough data for an A.I. algorithm—despite too many mass shootings in America.
A key breakthrough for the Actuate team was defining the problem in a way that made it tractable. They recognized that the precursor to mass shootings was someone taking a gun into a prohibited space. This allowed them to redefine the problem as “predict the presence of a gun in locations where guns shouldn’t be” instead of “predict a mass shooting.” In the United States, there are more than 300 million guns in private hands, and because there is no shortage of photos and videos of guns, the team had plenty of data to start building a gun-presence detector.
“We started from the general business problem of using technology to make it easier to respond to gun violence, refined the problem statement, and then evaluated technical approaches to determine what was possible to build,” Ziomek says.
Principle No. 2: Have you matched your training data with your use cases?
To make sure an algorithm will be effective when deployed, it’s imperative that the training data distribution matches the use-case data distribution. Otherwise, the launched algorithm may make mistakes on edge cases that are present in deployment and not in training.
In order to avoid missing these edge cases, the Actuate team started a long process of robustness testing, searching the data for instances where the model made mistakes. Since the product deploys using security-camera data, they leveraged security-camera footage in the training data process. But even with the data distributions matching, the team had to augment their dataset to improve their product’s accuracy level.
They started generating training data themselves by taking photos to fill in the gaps. For example, an early version of the technology couldn’t recognize silver revolvers as pistols. To fix the issue, the team took photos of silver revolvers in a variety of locations and lighting conditions. This process was hard and time consuming, but the team’s flexibility and agility in generating data meant they were able to deploy the system with a pilot customer within six months.
“The system’s idiosyncratic dataset requirements also act as a technical moat,” Ziomek says. “The same data generation effort would be required for a big company to compete and enter the market, making the investment well worth it.”
Eventually, Actuate began leveraging customer datasets to further improve the accuracy of the prediction model. By flagging all edge cases, they add over 10,000 useful images to the training dataset each week.
What makes Actuate special is its ability to create meaningful data that is custom tailored to the use cases. Tai and Ziomek have learned that small imperfections can magnify to create big problems, and their rigorous data generation and robustness-testing processes have turned existing data into useful data.
“We were interested in new types of security technologies, and believed every existing security camera can make the world a safer place without sacrificing privacy.”
Often A.I. startups will say “our product has 99 percent accuracy” or “our area under the curve is 0.9” as a metric to prove their algorithm’s efficacy. But these numbers don’t mean much without the context of the business decision that will be made from the algorithm’s output.
The Actuate team improved their accuracy using a traditional web-scraping approach to get images of firearms. Then they added to their dataset by crowd-sourcing crime videos from law enforcement and self-defense enthusiasts, including influential YouTubers.
“Adding the crowd-sourced data got us to 90 percent precision (positive predictive value) and recall (sensitivity),” Ziomek says.
Although 90 percent precision might sound promising, evaluating the usefulness of a tool is context dependent. The 10 percent error rate meant the software could hypothetically incorrectly recognize dozens of objects per day as weapons.
In this case, the cost of a false positive (mistakenly identifying a potential weapon) is high, and the cost of a false negative (not identifying a weapon) is infinitely high. Given this context, the Actuate team further invested in improving the accuracy by adding training data before lauching their product.
Today, after all this thoughtful development, the company has proved the value of its threat-detection system.Actuate currently contracts with US government entities and private companies and is monitoring over 10,000 cameras. The company is focused on further reducing the error rate of its algorithm while making it faster and easier to deploy in any organization—bringing the use of A.I. to make the world a safer place closer to reality.
Booth News & Events to Your Inbox
Stay informed with Booth's newsletter, event notifications, and regular updates featuring faculty research and stories of leadership and impact.
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.