Narrator: In the era of big data, more and more companies are using experimentation to make decisions about business strategy. An e-commerce company might be wondering how additional product suggestions affect order size. A social media network might be tinkering its recommended algorithm to increase user engagement. A bank may be trying out new promotional strategies to increase deposits. In each case, to understand the effect of the new strategy, the company is likely to rely on every scientist’s favorite tool, a randomized control trial.
David Puelz: If you’re thinking about a randomized experiment, what you typically do is you have a treatment group. You treat a randomly selected set of people, and you have a control group. And so that’s a group you just leave out. The goal is to try and figure out the treatment effect of that particular intervention. The way that you do that is you measure the outcomes for the treated units and you compare those to the outcomes for the control units, and then that difference is going to give you an estimate of the treatment effect. But there’s a bit of nuance in there, and this is where our particular research comes into play. And that’s if, well, what if people in the treated group and the control group are somehow connected to each other?
Narrator: That’s Chicago Booth principal researcher David Puelz. He and his coauthors created a new method that can help experimenters incorporate interference into causal analysis. When an experimental treatment can change the behavior of the units outside the treatment group, it’s known as interference. And given all the direct and indirect ways that people influence each other, it’s a ubiquitous phenomenon for data scientists to try to suss out cause and effect in real-world conditions.
David Puelz: We can think of social networks, Facebook, for example. People are connected because they’re part of the same Facebook group or they might be friends with each other. They might share similar interests. We can also think about people being connected if they live within the same household. So this is a very common form of network interference that we see in a lot of econ literature if I perform an experiment on households. For our paper, we focused on spatial interference. So, what if I perform an experiment on the streets of a city? To be very specific, the data that we looked at was a policing experiment, a large-scale policing experiment in the city of Medellín, Columbia.
Narrator: The researchers examined data from an experiment conducted in 2015 by a separate research team in which various street segments of Medellín received extra policing.
David Puelz: And what they did was they took the entire street network of the city of Medellín and they randomly selected a subset of crime hotspots. And for six months, they had police patrol that random subset of crime hotspots more frequently. And what they did was before and after the experiment, they measured crime on all of the streets of the city.
Narrator: The primary research question was whether that increased policing lowered crime in the treated hotspots. But a second question is how extra patrols on one street affected crime on other streets that weren’t selected for treatment. Did hotspot policing lower crime on streets adjacent to the hotspots? Did it simply displace crime to other areas?
David Puelz: OK, so now we’re going from talking about just the treated streets and just the controlled streets to streets that are surrounding those that are treated. This is perhaps a more interesting question for policy makers because if it’s just a case that crime is not being decreased, but instead the crime is just happening somewhere else nearby, then this might not be a very good strategy. This becomes complicated because we’re not dealing with very specifically treated streets and control streets but now streets that are somehow connected to those treated streets.
Narrator: The researchers’ method turns this statistical problem into a computational one. They begin by constructing a graph showing each member of the experimental group on one axis and every possible combination of treatment assignments on the other. For Medellín, that meant constructing a graph that would include 37,000 street segments and about 10,000 possible combinations of policing assignments. Then they use an algorithm to narrow this massive graph down to a subset of relevant units: spillover units, or those that aren’t treated but may be affected by treatment, and pure control units that aren’t treated or linked with any unit that is treated. Using what’s called a randomization test, they can then test for a treatment effect between these two groups to identify how relationships between experimental units affect outcomes. In the case of Medellín, they find that consistent with the results of prior research, additional policing on one street lowers crime on the streets around it. But the process they used to analyze crime in Medellín can also be applied to virtually any other question that’s muddied by interference. That makes it useful for answering a broad array of business questions.
David Puelz: This is the first general approach for randomization tests under interference, and that’s why we’re so excited about it, because it can not only be used for this particular crime experiment, but it can also be used at Facebook and all of the thousands of experiments that they’re conducting every single day, and Google— virtually any setting where I am running a randomized experiment, and my units are connected with each other, which is a very general idea that is almost always occurring. It’s almost always the case that my units, my experimental units, are going to be connected with each other in some way.