Maxger/Shutterstock

What Are the Limits of Big Data?

December 20, 2022
CBR - The Big Question

Companies, policymakers, and organizations are increasingly collecting large amounts of quantitative data about citizens, customers, and clients. How useful are these for decision-making, and when does it make sense to complement them with qualitative information, such as representative panels? On this episode of The Big Question, Chicago Booth’s Jean-Pierre Dubé, the Nielsen Company's Mainak Mazumdar, and Susan Paddock of NORC at the University of Chicago discuss what big data can and can’t do.

Video Transcript

(gentle bright music)

Hal Weitzman: Every day, organizations collect masses of information about their customers, clients, and partners, and analyze that data in the hope of becoming more efficient, more effective, and ultimately of selling more stuff. The more information, we might think, the better. But is big data always better? What assumptions and biases are built into big data? And how can organizations use large data sets without reproducing those biases?

Welcome to a live-streamed episode of The Big Question, Chicago Booth Review’s panel discussion series. Today’s episode is being filmed in collaboration with the James M. Kilts Center for Marketing at Chicago Booth and the Nielsen Company. Nielsen relies on the Kilts Center to distribute Ad Intel data to nearly 500 academics, who use them to explore research questions.

I’m Hal Weitzman, and I’m joined by an expert panel. Jean-Pierre Dubé is the James M. Kilts Distinguished Service Professor of Marketing and director of the Kilts Center for Marketing at Chicago Booth. Mainak Mazumdar is chief data officer at the Nielsen Company, the marketing research firm, and an advisor to the US Census Bureau. And Susan Paddock is chief statistician and executive vice president at NORC at the University of Chicago, a nonpartisan research organization.

Panel, welcome to The Big Question.

Mainak Mazumdar, let me start with you. Let’s define our terms, and particularly in your case at Nielsen, what is big data and where does it come from?

Mainak Mazumdar: Yeah, good question. Big data, we consider big data is the data that comes from digital signals. That can be, you know, data coming from your mobile devices that you consume day to day, or your computers. There are instances where we look beyond that, you know, like it can be smart television, where you are streaming your content and media as well. And folks would also consider data that is captured by camera or RFID or even audio as well. The internet of things is another area that a lot of data gets collected.

So all these signals that human behavior, when you’re interacting with devices and that gets collected and gets, you know, stored in cloud infrastructure, we call that big data.

Hal Weitzman: Just so I understand. How much of that have people explicitly said, “Yes, you are free to take my data,” and how much of it is just taken by use of certain software or whatever?

Mainak Mazumdar: Yeah, it’s kind of a mix because when you’re subscribing . . . so for example, you’re subscribing to Amazon Prime or Netflix or streaming services or even Apple devices, when you’re downloading an app, you have relationships with those businesses and there are, you know, in many cases, people sign up to be under, you know, understand that their information is been under, you know, captured somewhere, aggregated.

And then there are instances where some of these platforms and marketing infrastructure would target advertising. So it’s a mix of both, and I think that creates the topic of today, like where this big data sits in terms of trust, accuracy, bias, and so on, and so forth. And that’s a discussion that, you know, we are looking to—

Hal Weitzman: Yeah, we’ll get to all those issues, but thank you for teeing them up. Susan Paddock, let me bring you in. You are at NORC, which is a very different type of organization. When you think of big data and the sources, what are we talking about for you?

Susan Paddock: Well, I think of big data as often being a by-product of a process. It could be a business process or it could be, you know, some other process such as collecting, you know, information on medical conditions from patients or, you know, pharmaceutical prescriptions that are dispensed, and things like that.

And so in the work that we do at NORC, we think about big data along the lines of also thinking about “observational data.” Sometimes people use that term, or “found data” or “administrative data.” And so these data sources are, of course, very compelling and of interest to analysts because there’s usually a lot of information in the data, though the information tends to be focused around one particular purpose of the data, that purpose being the purpose for which the data were compiled in the first place.

Hal Weitzman: OK, but when you say “found data,” you mean as opposed to, like, survey data, for example?

Susan Paddock: Survey data.

Hal Weitzman: So your . . . people are revealing things through their actions or preferences rather than being asked about those actions or preferences?

Susan Paddock: Yeah, that’s correct. Like data that exists out there. And then, as an analyst, we approach that data and try to assess, you know, the quality of the data, whether the data are appropriate for how we might, you know, want to use the data.

Hal Weitzman: OK. JP Dubé, you’ve been on The Big Question before talking about what are big data. But just remind us. How do you think about this question?

Jean-Pierre Dubé: Well, it’s kind of a tough question to answer because I think what we would consider to be big data is evolving over time. But at any given moment in time, I tend to think of big data as a database that somehow pushes the limits of our current technology and requires us to be clever with algorithms of some sort in order to analyze them.

So I think historically, the term “big data” originally had to do more with storage, just the sheer quantity of data. This was an artifact of businesses starting to automate and being able to encode numerous kinds of interactions with customers and trade partners that they didn’t used to automate. And so these were naturally recorded and it just meant your database had lots and lots of rows, and storing all of that data could have been a real challenge.

More recently, I think big data sort of changed in terms of which dimension. It’s more about the number of columns of data. There’s just a lot more stuff I can track about a customer, for example. So big data might not be so much how much I have to store, but how much I have to analyze about a person to really get at some sort of truth or prediction, or whatever it may be. And I guess in the background in all of this, there’s also been an issue of what do we even consider to be data in the quantitative sense?

Historically things like—well, Mainak, you mentioned images, for example—we might not have really considered quantitative data by their very nature. They weren’t encodable. But now with software and other technology, things that we wouldn’t have considered data are now considered databases. We can encode colors on a picture. We can encode expressions on somebody’s face. We can encode geographic locations combined with a moment in time.

So just the sheer numbers of things that we can call data has also pushed the boundaries of our technology.

Hal Weitzman: Mm-hmm, and I said in the introduction that organizations are collecting more and more data. Is it more and more like, as you said, more different types of data? Or is it more in terms of volume, or is it both?

Jean-Pierre Dubé: I think it’s a combination of the three, that things that I can know about you and encode as data are evolving. Like I said, pictures that you might post online or comments that you might make with slang or other kinds of jargon, all of these can be encoded and called data now. The number of times I can engage with you, that would be getting more into the volume as well. And then, of course, just the number of specific things I can track about you. Things that I would’ve always called data, but I can now see what you bought offline. I can see what you bought online. I can see what you returned. I can sort of track you throughout the entire purchase funnel. And then, I can track a customer long after as the relationship with the firm evolves.

Hal Weitzman: OK, Mainak, big data. Is big better? Is bigger better?

Mainak Mazumdar: It depends. I mean, I think, you know, there are definitely advantages of big data. I would call out at least three, maybe four that we consider advantages. One is the details. Granularity. So you get pretty much . . . like if you take a picture, an image, you get that, to JP’s point, you know, tone, color. So anything that can be digitized, you get really a, you know, fine granularity and the level of details associated with the action that you’re trying to measure.

The second is what I would call continuous. This is a big plus or benefit of big data because it gets streamed. It’s captured by second and second. So if you’re trying to do any kind of what I call well-constructed cause and effect, it is more likely you’ll find that in big data because you have time dimensions, you know. It’s not that way in a survey. You ask something. You go away after one month. A lot of things can happen after a month. Here, if you are really constructing your database and architecture appropriately, you could actually capture the time dimension.

And the third is related to the first, second is timeliness. So if you want real-time updates about traffic, about your analytics—you’re serving marketing and you wanna change your campaign because it’s not working—you could get that update pretty easily, quickly.

And the fourth, it’s a little more technical and statistical: it’s the idea of variance. When we talk about bias, the variance is one area where we see enormous benefit using big data because you do reduce variances quite a bit compared to a sample or a, you know, what we historically have used. So I would say that those are the four areas where there are benefits, but there are huge challenges as well, which obviously we’ll get into it.

Hal Weitzman: OK. I’ll come to those in just one moment, but I just wanna get your view on the same question, Susan. This idea of bigger. Is bigger better? Because for organizations that maybe don’t have the statistical sophistication that you guys have that are collecting a lot of information about their customers and clients, is . . . should they therefore think that the more information they’re collecting, the better? Or does it get very noisy? ’Cause in JP’s example of just, more dimensions might not perhaps lead to more insight.

Susan Paddock: Yeah. Well, I think that bigger is not necessarily better, but it can be very useful. So there are certainly some potential disadvantages with big data if the data set doesn’t reflect the population that’s of interest. That’s, for me, No. 1. Another concern might be, there could be millions of data elements, but if those data elements are incomplete or there are reasons why they’re, you know, not completed accurately, then that can, you know, lead one into trouble. And so that’s something worth being careful about.

Where I would say big data can be better and some of the work that, you know, we’ve done is that combining big data with, say, a survey can really be very synergistic for both sources. So for example, going to Mainak’s point about the granularity, it might be that, oh I can, you know, have a survey of, you know, 10,000 people in various locations and it represents a well-known population. But some of what I’m interested in might be kind of rare.

So for example, at NORC, we collected data once on allergies, and allergies are rare. So we combined an observational data set on allergies with a survey, and we were able to learn a lot more from that combination of data than we could from either source.

Hal Weitzman: OK. And JP, your view on the same question about is the larger number of, you know, larger sample size better?

Jean-Pierre Dubé: I mean at some point, I think we run into diminishing returns. I mean there’s a reason why we have sampling theory, and for a lot of questions that businesses need to address, having considerably more observations may not really improve on their ability to reach logical conclusions. What I think a lot of folks overlook is the importance of the quality of how you analyze the data can be just as important as the quantity of data themselves.

I’m reminded of a contest about a decade ago run by Netflix to try and see who could make the best prediction algorithm, the recommendation algorithm for movies, and there was, what, a $1 million prize for this. I’m sure you remember this contest. And, for the first week or so, the teams, the main competing teams were looking at all sorts of ways to supplement the data they’d been given to get more and more information and eventually realized that adding more stuff about movies, expert reviews, popularity, and things really hit diminishing returns fast. And that the really big jumps were actually in the quality of how you analyze the existing data and not simply adding more data to an existing algorithm.

Hal Weitzman: Right, which is a really good point. I imagine it’s very relevant to a lot of, you know, organizations, companies that are looking at massive data that the answer isn’t necessarily just to collect more.

Jean-Pierre Dubé: Exactly.

Hal Weitzman: Even though that may be heresy at the University of Chicago, because usually we say the answer is to collect more data.

Susan Paddock, I wanted to come back to you, this question of trustworthiness that you raised. What makes data sets trustworthy or untrustworthy? What should be we wary of?

Susan Paddock: Yeah, that’s a really good question, and I think part of the answer depends on how the data will be used. And so it, you know, could be that a big data set is very trustworthy for its intended purpose. Like for example, counting how many items are scanned at a supermarket, for example. But if data are used in a way that goes beyond the initial, you know, reason, then one can really sort of, you know, get into trouble. And so then it’s very important to think carefully about how the data elements are being used and interpreted in a data set, in addition to making sure that the data represent, you know, the group or the population that one’s interested in.

Hal Weitzman: OK, how about you, Mainak? What about the trustworthiness of data sets?

Mainak Mazumdar: Yeah, I think that’s the heart of the matter here. I think the (clears throat) . . . if you take away all the noise outside, if we talk about noise and signals. Here is that the trust is the key topic that is front and center when it looks at big data. And I’ll tell you a couple of things, like what we see. We measure at the Nielsen Company, we measure media consumption globally, like what folks are watching on television, on handle devices, computers, and now large, you know, moving into smart TV. What we see is there’s a lot of noise in the data.

For example, I’ll tell you that, you know, sometimes your TV is on, off, or your TV is off, but your set-up box is downloading software. In many cases you’re watching, maybe you’re watching a football or World Cup, it’s on mute. And if there’s an ad coming up when you are on mute, advertisers don’t want to pay. They wanna pay live audiences, but that’s not gonna be captured in the mute data. The big data. It’ll be captured in the TV set.

So there are some challenges. And then, you know, our study shows about 20 to 30 percent of the traffic, you know, on the internet can be bots and spiders, which means machine-to-machine traffic. Now, some of it’s legit because you’re going to a page and you are looking for something and they’re targeting you.

Hal Weitzman: And that’s true . . . not to interrupt. That’s across all different kinds of media?

Mainak Mazumdar: Yeah, it’s largely . . . if you take, if you take—

Hal Weitzman: ’Cause we know, when we think of, like, social media as being very much like that.

Mainak Mazumdar: Yeah, if you take, like, a step back and say take digital globally, about 20 to 30 percent of people on the other side of the screen, they’re not people. It’s computer to computer, right? And if you’re measuring the client side on the computer, on the screen, on the browser, or add IDs to mobile devices, to her point, you’re just gonna count, and it’s an accurate count, but there may not be a person behind it.

So I think the trust goes to how you measure, how you curate the data and make sure you are organizing your data to answer those questions. Those are very important. And that goes to the whole issue, whole big topic about bias, right? So there’s an enormous bias in these data assets. And, by the way, they go in both directions.

For example, if you’re watching a Netflix show, you’re watching with your family, but Netflix counts you as one person, right? So you have co-viewing. So a lot more people watch those shows. But when you’re reporting accounting because you have one log-in ID, and that can be 20, 30 percent difference in the audience count. You know, then there are other issues like, you know, how you measure. Are you measuring for us? Are you measuring all the televisions in the homes? Or you’re just measuring one where you have a smart TV and you will log in. That you’re getting partial accounting for the numbers, people, number of folks who are watching the show.

So there’s a lot of biases. And the best way to address this, to have a high-quality panel or sample that allows you to really understand, when you bring those two together, you understand the biases, and there are very, very elegant techniques, and JP knows about this as well that have been developed over the last 10, 15 years that allows you to correct for that.

So I think that’s very important to the issue of trust, right? Yeah, to make sure we are correcting for the biases. And some level of independence You almost need to build something which is orthogonal, which means there’s some independence to be able to come in and measure the biases and correct for those biases.

Hal Weitzman: OK, say more about that. What do you mean by “independence”?

Mainak Mazumdar: Independence means, you know, if you are a subscriber, you’re accounting your own audiences, and you don’t know what’s really happening on the other side. There has to be some third-party independence, and that’s what Nielsen does. We come in as a third-party independence, and what we try to impart is that trust and quality and transparency in terms of what’s happening on those devices.

Hal Weitzman: Could you do that sort of auditing internally?

Mainak Mazumdar: We get audited. We have agencies—

Hal Weitzman: Right, I mean, could an organization do that sort of auditing without a third party?

Mainak Mazumdar: Yeah, I think, you know, a lot of organizations use internal audits to, you know, make sure they’re providing the right amount of quality and then governance around that. But when it happens between buyers and sellers, you need a third party to validate that. That validation piece is very important, that goes to that trust.

Hal Weitzman: JP Dubé, how do you think about the issue of trust and how organizations should be guided to what data sets are most trustworthy?

Jean-Pierre Dubé: So one of the big problems is of course the pros and cons of using technology. The key is scalability. Using technology, we’ve been able to scale up samples to dimensions we’d never imagined were possible before. So with NORC, you know, you were interested in surveys. We can build up lists of potential survey populations in the millions, which is terrific. It helps us start measuring rare populations that we would’ve never been able to survey before.

So imagine you wanted to do a survey of pregnant mothers, but now you want a survey of pregnant mothers who buy a very obscure brand of a product from a marketer. It actually might be possible to generate a sufficiently sized subsample of those mothers buying that specific brand.

The challenge is that that same technology that scales up the size of that survey makes it difficult to maintain quality control. So there’s arbitrage. We have companies and individuals who are using software to get paid to be part of these survey samples who aren’t really answering the surveys for you. So how do you screen out this software? How do you screen out bots? You have fraudsters of other kinds, people who try to create multiple identities. People are strategic sometimes on surveys. One of the main criticisms I’ve heard of the Amazon Turk panel is that sometimes respondents talk to each other while they’re completing the surveys. So there’s background strategies going on.

In a world where we were manually collecting these surveys, you would have one-on-one interaction with your respondent. It was easy to rule these things out. But in a world where we’re scaling up the collection of the data with technology, it becomes very difficult to rely on traditional measures of trust or traditional measures to ensure trust in the quality of the data.

Hal Weitzman: So we’re starting to get into some of the flaws, but just before we dive deeper into that, is there such a thing as objective data, big data?

Mainak Mazumdar: Objective data? That’s a loaded question. I’ll leave it at that. No, look.

Hal Weitzman: A loaded—

Mainak Mazumdar: If you design an architected system and a machine, or software, it gives you what you’re designing it for. So the reason I say it’s loaded because that’s what it’s supposed to do. But the question is as a practitioner, how are you gonna use it. And I think that’s where the objectivity, the scientific inquiry and the frameworks—

Hal Weitzman: So it’s in . . . this goes back to what JP was saying, it’s in the process of analysis that often the areas creep in.

Mainak Mazumdar: Exactly. And I think Susan started the discussion saying something important that these are exhaust, they’re not . . . this data gets created because they’re doing something else. It’s not designed to measure or designed to do certain things after the fact. So what happens is you really have to understand the purpose of how it was designed, why it was collected. And once you do that, then you could start addressing the biases and make it more objective. But as it stands, I think it just counts and it gives you the information that you originally designed it for, right?

Hal Weitzman: But is there something more objective about found data, for example?

Susan Paddock: No, I mean, I think Mainak covered it really well. Just in terms of thinking about the objectivity along with the analysis and how the data are gonna be used and being able to make judgments about the data quality, it’s important.

Hal Weitzman: I mean, one of the things I know that you are concerned about is the incentives the organizations that collect data have and how those incentives might themselves shape the data sets. Just talk a little bit about that.

Mainak Mazumdar: Yeah, I mean, this is becoming more important. I mean, one thing that we have started learning a little bit is that, you know, when you collect that software, these platforms and machines and devices collecting big data, they’re also making decisions based on a behavior. And that decision loop gets ingrained into the machine learning for next targeting, next personalization, next delivery, next merchandising, right?

What we are seeing is there might be a divergence going on between intent as a consumer and what these A.I. and big data is providing. So that’s the reason, I think, the incentives and best practices are important to bring . . . make sure there is a Venn diagram. There is an overlap of, you know, there’s convergence as opposed to divergence. And increasingly what we see is a lot of these big algorithmic types of services are diverging from what actually was supposed to get measured.

So best practices would be, you know, have some level of governance around that. Internally, governance, you know, best way to label the data, curate, understanding the quality. There is a whole field that is coming up called model operations. Usually called model ops, where you have all these machine-learning models, you have KPIs, so the model doesn’t drift. So you have a confidence interval and you make sure that model actually stays within the confidence interval, and if there’s a drift, you go back and address that. So there are some of what I call governance structure and frameworks that’s coming into the marketplace that we feel are very, you know, right, right? It’s going in the right direction.

The other would be, I still feel that, and I think both JP and Susan touched upon, there has to be some level of human validation, and, you know, you talked about MTurk, but also customer reviews, right? There has to be some level of human validation that it is Hal who’s watching the show. It is JP who’s, you know, writing the reviews or it’s Susan who said certain things. So that level of validation is important, and I think there are a lot of services out in the market place, including the companies, which are collecting a lot of data to build that. And I think that’s an opportunity definitely for a lot of companies. And we see that happening, but I think it’ll be better if it’s happening industry wide.

Hal Weitzman: Mm-hmm. JP, are you also concerned about the incentives that organizations have in collecting data?

Jean-Pierre Dubé: Oh, very concerned. And this ties back to the question of objectivity. Let’s take an example. Let’s suppose that I’m providing advertising services for a client and the client’s interested in whether or not more advertising would increase how many sales they get. So one way we could do this is to track how many people click on the ad. You might be able to even track how many people get to their website, but ultimately what we really care about is whether or not people spend money and whether or not the ads cause more sales. Well it’s really hard with most settings to link an individual exposure to an ad to a final sale. It’s a lot easier on the internet, for example, just to see if someone clicked on an ad. So you could objectively track who clicks and who doesn’t click, but then subjectively call a click a purchase. We’ll say, “We’ll call that a success.”

And this is where things get a little tricky: if the client on the other side of this doesn’t understand that the thing you called a purchase wasn’t actually a purchase. I mean this takes us back, but in 1998 or 1997, they used to keep track of what are the most clicked websites, and they would talk about eyeballs. And one of the most popular websites for one week only was out of nowhere, I forget what it was, but something no one would’ve been interested in and it turned out they’d run a very early stage banner-ad campaign, a display-ad campaign saying “Party naked.” So hundreds of millions of people clicked on these ads, and of course the website for one week had an enormous amount of traffic, but that didn’t have anything to do with any interest in the website or a purchase.

So you can see how by picking a click, which is objectively measured as a click but subjectively interpreted as a purchase, could misrepresent how effective an ad campaign was.

Hal Weitzman: And so that goes back to what you said about the methodology of analysis, but also to do with the repackaging and selling on of data that this was—

Mainak Mazumdar: Just to add, I think the concept of consumer and audience and person is very, very important. What we’re talking about is machine traffic, right? Traffic from devices, you know, the software and with the, you know . . . in cloud infrastructure, you’ve got collecting a massive amount of data, data passively, actively. But if you don’t put the person in the center, a lot of this trust in objectivity goes away. And I think that’s what, you know, ultimately, all this validation and trust requires people to design this, to make sure there are people into the mix. Otherwise you’d start designing things that’s not going to work.

Hal Weitzman: OK, we’ll come back to some more solutions in a second, but I wanna dig in, Susan Paddock, a little bit more into the biases that get built into data sets and then perhaps amplified in big data sets. What are the biases that can creep in?

Susan Paddock: Well, one bias that I always think about, off the top: Is the coverage biased? Is, you know, the group or the population I’m interested in represented by the data. That’s a huge way to obtain biased results if that’s not the case. Other biases can be embedded in the data, especially depending on how people use the data.

So a couple of years ago, there was a report that came out from a Harvard group about a health-care prediction algorithm in which the goal was to predict how much care people would need to use. And they included, you know, expenditures as a predictor variable. Well, that has a lot of bias, a lot of racial bias in particular, and that’s what they highlighted. And so, one has to really think hard about how the data are being used in these cases in order to make, you know, to avoid as many biases as possible.

Hal Weitzman: Mainak, what do you think are some of the biases that are most common that you see in big data sets?

Mainak Mazumdar: I think that coverage is important, right? For example, what do you see? You know, people would jump to a conclusion based on streaming, but streaming is pay to play. You have to actually subscribe. Not everybody has money to subscribe to 20 different streaming services.

So if you start generalizing the audience behavior of media consumption based on a couple of streaming platforms, you’re wrong, right? Then biases about completeness in the measurement, like for example, we like to . . . when we measure, we measure all the TVs, all the devices within the home or with the person. And that’s important because you shift, you move from one device to the other, and that completeness in the sample is very, very critical.

And the third is the person itself, and I still, you know, wanna double-click on this point, is that the person is very important when we are measuring consumer audiences. If we don’t account for that, if there’s a person really can be validated, you usually end up getting biased and wrong information. As I said, big data is extremely good in variance. So it reduces the variability of your estimates that you can get out of sampling. But bias is where I think there’s a lot of work that needs to be done.

Hal Weitzman: OK, and JP, same question.

Jean-Pierre Dubé: Yeah, the kind of bias I’m most concerned about I’m gonna call activity bias. It’s not a term I came up with. It was coined by some economists at Yahoo back in the day. But it pertains to these found data as we were describing earlier. If I’m just tracking people’s behavior online, and I’m gonna use my advertising example again. I’m interested in whether or not advertising exposure causes higher sales, the natural activity bias is gonna bias who actually gets sent those ads.

So then the found data, that kind of person who’s the most likely to expose to a digital ad is somebody who uses the web a lot. And it may very well be the kind of person who spends an inordinate amount of time on the web is gonna be a lot more likely to shop at a place that uses a lot of digital advertising. And you could give yourself the illusion that digital advertising is really effective because it’s highly correlated with expo . . . like sales are highly correlated with exposure, even though the exposure was really an artifact of the kind of person buying online also sees a lot of digital ads.

And so the bias I have in mind here is that the found data are usually not very effective for answering questions about what causes what and amplifying one of these found databases from the millions of people to the billions of people isn’t gonna solve that problem.

Hal Weitzman: Mm-hmm, OK. Very quickly, I wanna get each of you to give us one thing that an organization can do maybe to address some of these biases. Mainak?

Mainak Mazumdar: Yeah, I think that there has to be some independent validation of the data, and the second would be data governance. Have better data governance around the infrastructure and quality.

Hal Weitzman: OK. Susan?

Susan Paddock: I think as a statistician, I’ll say having people who are trained in study design and sampling on the team can be very helpful because even if the data don’t arise from a sample or a study design that’s really well known, at least understanding, you know, what sort of an ideal design would be is really helpful for sort of diagnosing issues with big data.

Hal Weitzman: JP?

Jean-Pierre Dubé: For companies that are trying to figure out what causes what, run experiments, randomized controlled experiments. They’re gonna be a lot more informative than relying on observational or found data.

Hal Weitzman: Excellent, well that seems like a great point on which to end. This has been a fascinating discussion. My thanks to our panel, JP Dubé, Mainak Mazumdar, and Susan Paddock. For more research, analysis, and commentary, visit us online at chicagobooth.edu/review and join us again next time for another The Big Question.

And a special thank you to all of you who have joined us for the past hour online. I hope you’ve enjoyed this stream. My thanks to our partners at the Kilts Center for Marketing, Chicago Booth, and to Nielsen.

(gentle bright music)

More from Chicago Booth Review

How Much Should Facebook Pay You for Your Data?

Framing changes how different people answer the question.

CBR - Behavioral Science

Is A.I. Startup Funding a Rerun of the Dot-Com Bubble?

An expert panel discusses how A.I. is reshaping entrepreneurship, and how founders and investors are responding.

CBR - The Big Question

What Do Companies Owe Their Communities?

What are companies’ responsibilities to their local communities, how should they go about fulfilling them, and how should we reconcile those responsibilities with the rights of shareholders?

CBR - Economics

NECESSARY COOKIES These cookies are essential to enable the services to provide the requested feature, such as remembering you have logged in.	ALWAYS ACTIVE
	Accept \| Reject
PERFORMANCE AND ANALYTIC COOKIES These cookies are used to collect information on how users interact with Chicago Booth websites allowing us to improve the user experience and optimize our site where needed based on these interactions. All information these cookies collect is aggregated and therefore anonymous.
FUNCTIONAL COOKIES These cookies enable the website to provide enhanced functionality and personalization. They may be set by third-party providers whose services we have added to our pages or by us.
TARGETING OR ADVERTISING COOKIES These cookies collect information about your browsing habits to make advertising relevant to you and your interests. The cookies will remember the website you have visited, and this information is shared with other parties such as advertising technology service providers and advertisers.
SOCIAL MEDIA COOKIES These cookies are used when you share information using a social media sharing button or “like” button on our websites, or you link your account or engage with our content on or through a social media site. The social network will record that you have done this. This information may be linked to targeting/advertising activities.

What Are the Limits of Big Data?

More from Chicago Booth Review

How Much Should Facebook Pay You for Your Data?

Is A.I. Startup Funding a Rerun of the Dot-Com Bubble?

What Do Companies Owe Their Communities?

Related Topics

More from Chicago Booth

Related Topics

Manage Cookie Preferences

What Are the Limits of Big Data?

More from Chicago Booth Review

How Much Should Facebook Pay You for Your Data?

Is A.I. Startup Funding a Rerun of the Dot-Com Bubble?

What Do Companies Owe Their Communities?

Related Topics

More from Chicago Booth

Related Topics