It is a bit difficult to say what criteria should be used to judge the success or failure of a research initiative on the scale of merging psychology and economics. Two reasonable criteria, at least to start with, are robustness and replicability. Why are these criteria particularly important in the case of behavioral economics?
In our new book, The Winner’s Curse: Behavioral Economics Anomalies Then and Now, we revisit a series of anomalies that form the empirical bedrock of behavioral economics. Since anomalies are by definition unexpected findings, they deserve special scrutiny. Indeed, the original “Anomalies” columns published in the Journal of Economic Perspectives, columns on which most of the chapters are based, often highlight the back and forth surrounding early behavioral economics findings and the skeptical replies from traditional economists.
The controversy regarding behavioral anomalies is similar (in some ways) to the firestorm started by a 1994 paper by David Card and Alan Krueger on the effect of minimum wages on employment. Before that paper appeared, most economists thought that raising the minimum wage would (obviously!) also reduce the number of people who held such jobs, as employers reacted to the higher cost of labor. Card and Krueger published an anomalous finding. When New Jersey raised the minimum wage from $4.25 per hour to $5.05 and Pennsylvania did not, the researchers surveyed fast-food restaurants in the neighboring states and found that employment did not go down in New Jersey—it actually went up! There was certainly no support for the expected significant drop in employment in Jersey.
This finding was considered nothing less than heresy by some prominent economists. For example, Nobel Laureate James Buchanan wrote in The Wall Street Journal, “Just as no physicist would claim that ‘water runs uphill,’ no self-respecting economist would claim that increases in the minimum wage increase employment. Such a claim, if seriously advanced, becomes equivalent to a denial that there is even minimal scientific content in economics.” Fellow Laureate Merton Miller reacted to the finding, in the same outlet, this way: “I tremble for my profession.”
We will not try to summarize the voluminous literature that has ensued since then, but as you might expect, their original paper spurred a cottage industry of economists testing the robustness of Card and Krueger’s conclusions. And this is exactly what should happen when someone publishes a surprising empirical result. Our reading of the follow-up literature is that there remains little support for substantial negative employment effects of minimum wage increases—at least within the range that has been implemented. (We think it is obvious that if you raise the minimum wage high enough, employers will hire fewer workers.) Buchanan and Miller would likely have been willing to bet that the original finding would have been overturned long before now. And Miller (and many others) would have made similar predictions about the topics of the chapters in our book. So we thought it was worth the effort to see how these ideas have held up.
Behavioral economics is inherently an empirical field. When John von Neumann and Oskar Morgenstern developed, in the 1940s, Expected-Utility Theory in economics, they did not have to collect any data because they were engaged in a purely theoretical exercise: How should someone make decisions in uncertain situations? But when Maurice Allais pointed out the first anomaly contradicting the theory just a few years later, he did so with data. (Although in his case, the ideas worked pretty well as thought experiments.) And when Daniel Kahneman and Amos Tversky created the alternative Prospect Theory in 1979, the research was entirely data driven. They asked thousands of subjects (mostly students) hypothetical questions about risky choices. The entire enterprise could fail if, for example, students at elite universities make choices differently from “real people,” or if they would behave differently if there was money on the line, or if the particular questions asked were highly unrepresentative. In short, the field could be only as good as the underlying data upon which it was based.
One of the important practices adopted by the pioneering experimental economists is that all instructions and data be attached to each paper.
So if the foundation of the field is empirical, it becomes vital that those phenomena are robust—that they show up across domains and are highly replicable. The last point is particularly important in light of the so-called replication crisis in some branches of psychology and other fields, where some core findings from research studies cannot be replicated by other experts—or even worse, are found to be made up in the first place.
Part of the goal in writing our book was to survey and actively test the robustness of the behavioral anomalies. We were pleased (and relieved) to find that the foundations of behavioral economics are not crumbling. The updates we provide both review replication attempts and demonstrate the external validity of the anomalies in many real-world settings. In the online materials to the book, available at thewinnerscurse.org, we provide detailed experimental instructions and our own replication results for most of the major findings discussed in the book’s chapters. The endowment effect, hyperbolic discounting, social preferences, and loss aversion—all these classic results and many others are robust to replication attempts. In addition to including our own replications, the online materials provide computer files, code, and digital instructions so readers can easily replicate the results themselves.
What explains the high reproducibility of core behavioral economics findings? One reason is that the topics Richard picked to highlight in his original “Anomalies” columns documented quantitively large departures from the standard approach. Selling prices are roughly twice buying prices—not the same. Half of participants cooperate in public goods games, not zero. Furthermore, some of these anomalies have been known for centuries. Self-control problems are discussed in the Bible! They were also apparent to Adam Smith and to prominent economists in the first half of the 20th century. Still, seeing large effects consistently across several studies should provide some confidence that they are replicable—and also meaningful.
Another key factor is the field’s methodological approach of doing cumulative science. The idea is that knowledge is built up step-by-step, with researchers starting directly from the foundation laid by earlier scholars. One thing that helped create this ethos is that behavioral economists and their psychology collaborators interacted often in the early days with experimental economists and their leaders Charles Plott, Alvin Roth, and Vernon Smith. There were several productive joint conferences.
One of the important practices adopted by the pioneering experimental economists is that all instructions and data be attached to each paper. This level of transparency generated a flurry of research in which scientists would directly build upon an early paper using the original paradigm to explore mechanisms or study important generalizations and exceptions. These extensions of the original paper were rewarded by the field.
A good example is Smith’s 1988 paper on the emergence of bubbles in experimental asset markets. The original paper was published in Econometrica, a top journal in economics, and included all the relevant instructions necessary for replication. Many papers followed the original design to explore the robustness of bubbles to various factors. For example, Martin Dufwenberg, Tobias Lindqvist, and Evan Moore looked at whether bubbles would emerge with subjects who had participated in the experiment before. This paper was published in the American Economic Review, another top economics journal, in 2005. All these extension papers included the original design as the control condition, meaning that replication was inherently part of the scientific process. The fact that journals were willing to publish these papers, and the continual process of replication as a first step of any extension, gave the laboratory experiment arm of the behavioral economics community a solid foundation for discovering robust phenomena.
The process of replicability makes the field highly robust. Knowing that your study will likely be replicated also discourages fudging any data.
Behavioral economics has followed this tradition more generally. Going back to the pioneering work of Kahneman and Tversky on Prospect Theory and Werner Güth’s work on ultimatum bargaining, we see that all instructions were readily available to the reader, and in the case of Kahneman and Tversky, the actual questions were reproduced in the body of the text. The flurry of research that built and tested the robustness of these results used the original design to identify moderators, extensions, and limiting factors. This research process ensured that if the original findings were not replicable, the field would soon find out.
As one example, David Grether and Plott set out to disprove the behavioral phenomena known as preference reversals. (Participants choose Gamble A over Gamble B, but say they would pay more to get B.) Instead, after replicating the experiments of psychologists, they found that preference reversals are robust. The fact that their successful replication was published in a top journal is evidence of a good process.
However, direct replicability should not be confused with generalizability or universality. While the basic patterns of behavior documented in the original experiments reliably show up in subsequent replications, their magnitudes can vary substantially across contexts. Loss aversion might be stronger in some settings than in others. Understanding this variation has become a central focus of contemporary behavioral economics research. Rather than viewing such heterogeneity as a challenge to the field’s foundations, researchers increasingly see it as a source of insight into the underlying psychological mechanisms driving behavioral anomalies. This signifies the broader evolution of the field: moving from discovering “existence proofs” of specific departures from standard economic models to also documenting new anomalies within the field of behavioral economics. This fosters a greater understanding of which real-world settings are more or less likely to prompt behavior that departs from the predictions of the standard theory.
Although the earliest research in behavioral economics was based on experiments, the field quickly branched out to using archival data. That practice has provided great demonstrations of the external validity of the findings (that is, how they hold up in actual market settings).
This is perhaps best exemplified by the work in behavioral finance. Although the bubble experiments we mentioned earlier are interesting, the availability of fantastic data such as daily prices and volumes on US stock markets going back to 1926 via Chicago Booth’s Center for Research in Security Prices made it attractive to study real financial markets through a behavioral lens. For researchers studying US markets, this means that everyone is using the same dataset, and any surprising finding can and will be checked. (It is standard in financial economics to give very detailed descriptions of how all analyses were done.) Indeed, at the NBER Behavioral Finance conferences that Richard organized with Robert Shiller for many years, it was not uncommon for the designated discussant of a new paper to offer additional analyses and to try alternative specifications. This was made possible by everyone’s using publicly available data sources and making sure that published papers carefully outlined all the analyses.
The process of replicability makes the field highly robust. Knowing that your study will likely be replicated also discourages fudging any data. (Of course, there can be coding errors, but these are usually quickly discovered.) We should emphasize that this does not mean there are no controversies! When Shiller published his paper claiming that stocks were excessively volatile, another paper emerged that many economists jokingly referred to as the “Shiller killer.” Similarly, when Richard found with Werner De Bondt that long-term losers outperformed long-term winners going forward, these facts were quickly confirmed. The controversy then turned on interpretation: Are the losers doing well because of mispricing (a mistake) or because they are risky? That debate continues to this day, occasionally between Nobel Laureate Eugene F. Fama and Thaler at their favorite golf course.
One of the themes we stress throughout our book is that many anomalies that started as mere thought experiments—such as mental accounting—have been documented in careful empirical work using observational data from around the world. And we are not aware of any published finding based on laboratory experiments that was overturned simply by raising the stakes (say from $10 to $100) or leaving the lab to study related behavior in the field. Indeed, raising the stakes often makes the anomalies bigger.
When you think about it, the surprise would be if the original findings did not hold up. Yes, the world seems to be changing at a rapid pace, but if human nature is changing, it is at the pace of a snail, not a rocket.
Richard H. Thaler is distinguished service professor of behavioral science and economics emeritus at Chicago Booth and was the 2017 recipient of the Nobel Prize in Economic Sciences. Alex O. Imas is the Roger L. and Rachel M. Goetz Professor of Behavioral Science, Economics, and Applied AI and the Vasilou Faculty Scholar at Booth.
Adapted from THE WINNER’S CURSE: Behavioral Economics Anomalies, Then and Now by Richard H. Thaler and Alex O. Imas. Copyright © 2025 by Richard H. Thaler and Alex O. Imas. Reprinted by permission of Simon & Schuster, LLC.
Your Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.