Improvements in computing power, innovations in data analysis, and changes in how consumers interact with brands and products have ushered in the era of big data in business. But now that marketers and other executives have access to enormous caches of customer data, what are they doing with it? Chicago Booths Sanjog Misra and a trio of industry experts join Hal Weitzman to explore what new opportunities and challenges have arisen due to data breakthroughs.
Hal Weitzman: Technology allows companies to track their customers’ behavior like never before. It’s made it easier for companies to run their own experiments, and when it comes to marketing, they can use insights from data analysis to customize advertising and sales down to the individual consumer. So what are companies learning, and how are they using the data they collect?
Welcome to The Big Question, the monthly video series from Chicago Booth Review. I’m Hal Weitzman, and with me to discuss the issue is an expert panel.
Sanjog Misra is the Charles H. Kellstadt Professor of Marketing and Neubauer Family Faculty Fellow at Chicago Booth. His research uses data-driven models to examine how consumers make choices and firms make decisions on pricing, distribution, and sales force management. Ted Buell is head of insights and analytics for Google’s sales and marketing organization in the US, in which capacity he advises Fortune 500 companies in retail, tech, and telecom on using digital marketing. Andrew Appel is president and CEO of IRI, a provider of big data and predictive analytics for companies in the consumer packaged goods, health-care, retail, and media industries. And Mike Boush is senior vice president, e-business, and chief digital officer at Discover Financial Services, the credit-card issuer, direct bank, and electronic payment services company.
And all three of our industry panelists are Booth MBA, so welcome back to the University of Chicago.
Sanjog Misra, let me start with you. When we say big data, what do we mean by big data in this context?
Sanjog Misra: So let me just start by, kind of, thinking about big data that typically how firm’s think about big data, which is to one, think about the volume of data, the amount of data that we have—and that’s grown exponentially in the last, you know, decade or two. Then there’s the aspect of the different types of data that we’ve been seeing, and that’s kind of, there’s been an, you know, an explosion in the variety of data that we see, the structured and unstructured data, decks, textual data, nontextual data, and so on. Finally, we have the velocity at which the data comes in, and that’s the speed, so we have real time data that we didn’t have. The cadence of data has changed a lot, and that’s typically how firms have kind of defined big data.
My own perspective is likely different. I think big data is kind of a term that captures different things to different people, in particular the way I like to think about it is, data is big every time it makes you feel small, right? So the idea being, this concept is a function not only of the amount and of the different types of data you have, but also the infrastructure and the people that you have that deal with this data, so that’s the way I think about data.
Hal Weitzman: OK, what about at Google, Ted Buell? Is it, you are dealing with vast amounts of data, so how do you, I mean, that’s obviously different. How do you parse that for companies you work with?
Ted Buell: Yeah, I think overall big data is really about helping businesses and marketers specifically make better decisions. So there’s different forms of it. I think one form is the data that companies and marketers collect based on the relationship with the customer, whether it’s purchase history or their preferences.
I think another form of data is the interactions that consumers have with that company’s or that marketer’s assets, whether it’s their website or their stores or different aspects.
And then the third I would say is how brands and marketers use existing products, whether it’s to understand consumer behavior at large, so how, what are people searching for on Google and how does that, those trends inform their marketing strategy, or what kind of videos are people watching on YouTube, and ultimately how does that behavior help them understand how to navigate the space.
Hal Weitzman: OK. Andrew Appel, how has this changed marketing? What can we do now that we couldn’t do before this revolution started?
Andrew Appel: Oh, I think in a lot of ways. I think you, you know, I guess I think of data as the breadth of information that you can get around a business decision that needs to be made or activated. And so there is, you know, if you think about the typical day in the life of any one of our kids or consumers, right, they are leaving a footprint of information that is exponentially bigger than it was 10 years ago.
And that footprint, right, of all the different behaviors and all the different sites they’re on or where they physically are and what device they are on and how they interact with others in the ecosystem has an extraordinary impact on marketers. Because if the ultimate role of marketing is to get consumers to ultimately buy an additional product or have a propensity to buy an additional product in the future, if you take kind of the economic view, then you have all this explosion of, of information about what consumers are doing at any given moment, and how do you then analyze that in speed with sophisticated models to ultimately shape how you stimulate those consumers to take a different decision.
Hal Weitzman: OK. Mike Boush, what about in your industry, in your company, how are you, what are you doing differently now that you couldn’t do, you know, say 10 years ago?
Mike Boush: I think that there is such a large set of information that has been available to use for financial services firms and insurers and so forth for a long period of time that we’re comfortable in the industry using big data, and it’s not as recent a phenomenon for some industries as others. I think what’s changed in the last few years, first, is the observational data points that are attached to mobile devices, and each time somebody clicks on the internet, they’re teaching it what they’re interested in, and a lot of that data was just not computable in practical sense, a few years ago for us.
And now it’s rounding out the picture of the consumer for us, and I think it’s allowing us to determine the difference between what somebody states and what somebody does, and in a lot of cases, you know, in past research, we would have to ask somebody what they thought, and now we can infer it through their actions.
Hal Weitzman: Kind of revealing consumer preferences in an interesting way.
Sanjog Misra, this hasn’t just affected marketing, it’s affected how companies make decisions, so tell us about that. How has that changed?
Sanjor Misra: Big data, or what firms have figured out is that the data doesn’t quite exist only on consumers but it also exists about your employees. It exists about your business practices, about your suppliers, about competition, about the environment, and when you start to look at all of that, kind of, in some cohesive way, there’s a lot of things that you could do now that you couldn’t do before, right?
So there’s been projects that I’ve worked on where we’ve optimized sales-force compensation plans because you can think about the amount of data that a particular salesperson generates through his or her actions over time, that in itself is a huge amount of data at the individual level. So what can we learn about how they react to their compensation environment, and then reoptimize it? So that’s something that’s entirely new and different that we couldn’t have done five, 10, 20 years ago.
We’ve re-optimized things like the standard marketing aspects, like promotions for a firm like MGM, where you’ve got millions upon millions of customers and each one of them reacts completely differently to a set of offers, and now you start thinking about not who do you target and how do you segment markets, but also how do I individualize the offers that I actually make to these customers? How can I personalize them?
And this has been an idea that’s been around for a really long time, the idea of targeting individuals with an individualized offer. Now we are at a point where we have enough information as well as computing power to actually make that happen.
Hal Weitzman: OK, and you’ve done some research on that.
Sanjog Misra: I have. So I was giving an example of like, the sales-force compensation project was for a large medical devices company. We re-optimized, kind of, promotions for MGM. There’s a project that I’ve been working on currently on trying to figure out how do we dynamically price. When I say dynamic pricing, for each customer we try and figure out what they’re willingness to pay would be for a particular variant of the product, and that’s done in about 18 milliseconds, right?
So that’s the other aspect of something that we have to talk about, which is with large amounts of data, there’s also, data is also perishable. It’s only relevant for a really short amount of time, and you have to make the correct action in that short amount of time. So there’s a page loading, so for example, you know, you’re going through Google and you want to figure out what particular creative to show someone as a page loads up. You have about anywhere between 25 and 200 milliseconds to make that decision, and there’s an entire industry that’s competing on that front, so there’s a lot of interesting things going on there.
Ted Buell: And I would say if you’re a marketer the fact that certain data only is good for a certain period of time, I think that’s actually a good thing that consumers are generating more and more data. Obviously, potential challenges with that, but that actually gives you an opportunity to understand those moments and those different ways consumers are making decisions or evaluating decisions for your product, for your category, for your brand, and I think that can inform what you do.
Hal Weitzman: Andrew Appel, you’re working with lots of different companies. What kinds of data are most useful for them?
Andrew Appel: Oh, look, you can boil it down to an incremental data set that allows them to improve the effectiveness of a business process, whether it be supply chain, store, or marketing, right? And so, you know, we’re amalgamating 50, 60 different data sets, whether it’s what’s physically on a shelf, what’s in a supply chain, what’s been sold last week, what’s somebody’s viewed on a television show, what’s . . . where they spend their money outside of a grocery store, inside of a grocery store.
Each incremental piece of data has, for it to be useful, obviously, has a marginal value on improving the efficiency or effectiveness of the process. And so that’s, that’s how we think about each incremental piece of this big puzzle that we’re putting together is: How does it drive efficiency and effectiveness in, kind of, the business to business collaboration between the companies and then how does it drive more accurate insight into the individual consumer decision-making?
And some of it is decay and some of it’s not, right? Some of these behaviors, there’s a lot of data that’s really valuable for a minute, and then there’s an equally large set of information that’s valuable for two years. ’Cause people’s buying behaviors don’t shift that quickly. And so what drove someone to kind of bias themselves for a certain purchase on a certain retailer or a certain store or a certain product today is not that different than a year ago. You know, it will be five years from now, and now a little bit of context about where they are, for example, like mobile context can help you target an advertisement or a promotion at the point of the decision, that’s different.
Hal Weitzman: OK, Ted Buell, do you think companies are collecting enough data or the right kinds of data?
Ted Buell: I think companies are certainly collecting a lot of data because of, first of all, there is a lot of data collected and second of all, there are a lot of resources to be able to do that. I think the key is really the second point, which is, are they collecting the right data or, more specifically, are they evaluating the right data?
So I think that’s one of the biggest challenges we see today is not just are they, are marketers collecting the right amount of data, but the right data to make decisions. And so we look a lot at, what are the data points that really show intent for shopping for a retail category or shopping for financial services, whatever it may be, and then ultimately what are those data points that show intent and can inform a business decision.
Andrew Appel: So also, let me just add one thing. Like data, so for your answer, is incredibly messy, right? It comes at lots of different levels, and so very few companies, actually, are having a lot of savvy at taking divergent, different data sets and getting them into a usable form to then make decisions with. And so there’s, you know, they come in different timeframes. They come in different intervals. They come at different levels. Sometimes you get the full set. Sometimes you get the panel. Sometimes they come at a brand level versus at a SKU-level, and so, you know, a lot of the players that, you know, are good at, are moderately good at using their own data sets, but once they have to incorporate any other data that’s second or third party, it becomes very challenging.
Mike Boush: Well, and it’s almost too convenient to say it’s universally true, but the more data, the better the model will become, and in some ways, there isn’t too much data, there is only as the model taking into account the right things. Specifically, there are a lot of devices that are throwing off data all the time and so IOT has become—
Hal Weitzman: IOT?
Mike Boush: The internet of things, devices that record statuses and machines and positions and so forth, they can be used to, as you mentioned, solve business problems that were really just matters of forecasting in prior years, but there wasn’t enough information to make a better forecast. So with the question, do businesses have enough data, more to make better predictions seems like a generally acceptable, good thing.
Ted Buell: And to build on that, you hear more and more about machine learning, or artificial intelligence, and there are a number of companies that do it really well. But Mike, to your point, it’s really the breadth of data and the amount of data that those companies can actually process to inform and make that learning smarter, so in some cases, companies think about the breadth of data as almost a risk or a debt just because it’s so cumbersome yet, to Mike, to your point, the large amount of data can actually start making your processes and your learning better.
Andrew Appel: Well, and there’s another half to this, which is the ability for the system to actually make, take the prescripted action on its own. So you have data for insights and data for modeling and data for analytics, but then the next evolution is the system just is gonna use the analytics that come out of the data and effectively execute the decision to improve the effectiveness of the process.
So we see that in, for example, marketing optimization, right? It’s a very easy example to say, if I can look at the ROI or ROAS or sales lift of a digital campaign across 14 different factors, and then I can just rebalance the purchasing automatically, I don’t need a person in the middle of it who’s too slow to respond, right? I can just reshape, you know, the next four weeks or the next week’s purchases based on the return of the last five weeks or based on what the sells that are statistically driving higher value if that’s the factors that I wanna optimize.
Hal Weitzman: Management by machines. It sounds fantastic, but I mean, what proportion of companies are actually doing this? This all sounds very, sort of, futuristic. Are companies really using data correctly, Sanjog Misra? Or optimally?
Sanjog Misra: So I’ll answer the first part of your question which is, are companies actually doing this? The answer’s yes and no. So in certain contexts, yes. So I teach a course at Booth called Algorithmic Marketing, which focuses on essentially what I see is the shift that’s coming in the next 3 to 5 years, where a number of areas where humans have been making decisions, that’s gonna be replaced by some combination of data and algorithms, and we wanna be ahead of that curve.
So in certain industries, so take, for example, kind of, online advertising, display advertising, you know, the timeframe is such that you literally have somewhere, like I said, about 100 to 200 milliseconds to make a decision. It’s impossible for humans to be involved. There’s entire auctions being run within that small time interval, and there’s, you know, there’s not gonna be a human intervention that kind of allows you to participate there.
Where I see things changing are places where we haven’t actually yet tapped into the use of algorithms and data in big ways. So like HR analytics, for example. Human resources is still kind of untouched by this big data revolution. It’s only starting to kind of . . . a few firms that are emerging that wish to automate this entire process. Things like performance evaluation using machine learning and the data that’s collected within the firm to evaluate employees.
New product design issues, you can think about not only using data to think about these traditional marketing decisions like pricing and distribution, but also how do we design new products on the fly? Can we customize products on the fly? And that’s something that’s kind of going to emerge.
And then more broadly, it’s a question of taking all of this data and thinking about making, you know, decisions more broadly, not just within business, but outside business, right? Can we improve the efficiency of supplemental nutritional programs, like SNAP programs, or food stamps, for example. Or charitable giving, right? These are also businesses or government, kind of, decisions that can be improved with data, but we’re not quite there. There’s been efforts in that direction, but we’re not quite there yet.
Mike Boush: I would say that the concept of management by machine needs to, obviously, needs to be approached with caution, however, because I do think that we’re in the early days of using machine-learning algorithms to make behavioral predictions and in some cases, we can hook them up to operational systems, which execute those predictions without some type of intervention.
But in a lot of cases that I’ve seen and the folks that I’ve worked with and talked to, the data that is represented in the data set, if it indicates human interactions, then contains biases, which the model can pick up and model into itself. And so one can accidentally recreate what would happen in human behavior through a computer model, and it can be rife with the biases.
I’m hoping that prudent management of this technology over time will allow us to make better decisions than humans would, rather than just take the totality of the human decision pool and model what would happen if humans were to do it.
Hal Weitzman: Andrew Appel, are companies making optimal use of the data they collect?
Andrew Appel: You know, I still think we’re in the early innings, call it the second inning of this journey for companies to use data, and frankly, they may never get there because the expansion of the data sets are happening faster than they can get their arms around the data they have.
But, you know, we work with a lot of the leading retailers in the country and a lot of leading manufacturers and they do a lot of analytics around data sets they’ve been using for awhile, and they’ll look at them, you know, and maybe add one, you know, they’ll crank out a PowerPoint that has the insights that come with it, but the idea that you have a, kind of an on-demand analytics capability that, you know, crosses both, whether it’s the store dimension or the consumer data dimension that has, you know, a 360 view of consumer behavior that then you’re able to, like, pull out, you know, micro-insights on behaviors that happened yesterday, we’re awhile from that.
I think companies are, the other thing is, I think companies are reasonably good at using their own data and just learning how to integrate it with third party data to kind of enrich it with the behaviors that happen outside of their core data set, at Discover, as a credit-card company, or a large retailer that has a loyalty program.
Hal Weitzman: OK, Sanjog Misra?
Sanjog Misra: So I wanted to go back to something that Mike mentioned in terms of, you know, the idea of machine learning using all of this data and essentially replicating the biases that human beings have.
So one of the things that’s happened is firms and researchers have also figured out that you can create your own data sets, and experimentation has taken off as a tool for decision-making unlike, you know, something that we’ve never seen before.
What I’m seeing, like, firms like Netflix, right, where every single decision is tested out. There’s AB tests that are done, data’s generated, and decisions are made on the basis of that. We see this with Google, experimentation’s being built into many of their tools, like the 360 tool at Google, or on Facebook, you know. Anything you want to do, you have the ability to test, create your own data.
And I think on the research front, what people have also figured out is that the traditional approach to machine learning and data science is not what we want for business decision-making or for economic decision-making. What you want is, you want to marry causal inference and machine learning in clever ways so as to get at the truth, right?
So machine learning is extremely good at prediction, which is great, but if you think about policy interventions or business decisions, it’s not about predicting what’s gonna happen. It’s about telling you how, how things might change because of a marketing intervention or some policy intervention that I might take, and that in itself is an interesting concept because we’ll never have data about the counterfactual, right? I can tell you what’s gonna happen based on what I’ve seen, but I can’t show the same person an ad and not show them an ad, and that’s really what I’d like to do. I’d like to generate data from both those worlds and then look at the difference.
So I think a lot of kind of the cutting-edge research that’s going on is at this confluence of machine-learning economics and causal inference, so as to be able to answer these questions, hopefully with some degree of accuracy.
Hal Weitzman: Are companies typically comfortable running experiments with their own customers? Mike, are you running experiments at Discover?
Mike Boush: I think within terms of marketing experiments with data that customers give us permission to access, yes. The goal of all of these algorithms is to make a better customer experience, to make a better experience and have more positive outcomes for customers, which hopefully will mean more positive outcomes for businesses.
So I don’t think that it is underaspirational to try to predict relevance or predict, you know, what somebody might like. I do think that the sophistication of the models that are in play right now, you know, they’re in the early innings, and it takes large sets of behavior in order to, to get down to an individual choice.
Ted Buell: Looking at marketing in regards to experiments, you see the more progressive and most progressive marketers using experimentation to understand things, ’cause there are a number of different tools and platforms in place to measure the effectiveness of marketing today, and they’ve been around for a number of years, but to the point that was earlier made, in order to be able to see a new trend or an emerging trend, you do have to have this culture of test and learn, and use experimentation to test certain hypotheses, otherwise you’re in a bit of a lean back and observe, which is OK in some cases, but I think the rapid rate at which consumers are experiencing your brand and experiencing your products today, I think we see the best brands and marketers leaning forward, and really using experiments to figure out, you know, to either test or prove or disprove their hypotheses.
Mike Boush: It may be reasonable to just think that there are false positives to these models, and so to the extent that you’re choosing which advertisement to put in front of a person, if you miss it’s OK. It’s when we talk about automatically executing against the outcomes of some of these models that it’s worth being measured, and understanding that there are false positives to these models.
And this is really pattern recognition, but there can be points that lie outside of the pattern that are in the data set. So banks and insurance companies use large data sets to predict losses of fraud and so forth, and those things are gathered over patterns, but there are individual points in there that don’t fit the curve. And we wanna be careful when we execute against those.
Hal Weitzman: Andrew Appel?
Andrew Appel: Oh, I was just gonna say, you also have to, you know, recognize that each and every individual interaction is unique in and of itself, and so the models don’t scale down to the level of personalization that we think about in theory. Because you just can’t get the data sets together, I mean, the buy-through rate for particular advertisements, by the time it goes through, did someone see it? Did they take an action? Did they buy an item, right? And then what were the other, you know, numerous factors that led them into buying an item? Was it on sale? Was it not on sale? Did they know they were gonna buy it anyway, right?
So the idea of, like, on-demand, real-time test and learn runs into the reality that people aren’t spending, you know, a billion dollars to decide which of the 400 different experiences. And even then, even if I as an individual company, let’s say Discover, create 400 permutations of test and learn to look for the best returns, right, you’re not sure that there aren’t hundreds of other contextual things around that person that are biasing the results of the sample because people are so overexposed and so constantly being, you know . . . there’s so many factors into the decision model.
So it is actually quite hard to get to these little micro-segments because it’s hard to get the flow of information accurate enough from what ad to target to decision to result at scale. I mean, we see this all the time, just to get purchase data on consumers from, let’s say, a loyalty data set, you know, you need hundreds of millions to be able to look at, you know, 50 permutations of an ad of a hundred million items or something like that, let alone four thousand. And it just, you know, the models just die out.
Ted Buell: And I think as you look across different industries, I think some are harder to do that than others, right? Based on the velocity of the data that’s available and how many data points you are collecting, ’cause you’re right, there are so many different permutations of that consumer journey, and what they experienced and how they ended up making the decision, and so, you know, you do see a little bit of variance among industries.
Mike Boush: So what is nice about all of that, though, is that now that we have so much more data about exposures and indications and expressions of interesting clicks and likes and so forth, we do get better attribution overall, but we have a larger set of data for which to describe a particular consumer decision journey, and so we can model a path out more accurately, but there might be a dozen subpaths inside of that, and so as we get more specific, while we’re more confident in the overall outcome, we’re less confident in the individual journey as you dial it down more.
Andrew Appel: Yeah, it’s effectively like, the consumer experience is increasing in complexity at a rate and the science and the attribution is increasing, and the question is which one’s moving faster.
Mike Boush: Yep.
Andrew Appel: The fragmentation of the behavior, or the science to figure out what predicts the behavior?
Hal Weitzman: So if this micro-segmentation down to the individual consumer is not yet happening, how are companies using data to segment their audiences? Ted Buell?
Ted Buell: Sure, so, I think if you, so I work in digital marketing, so I work with big retailers to help them understand how they can use Google to grow profits for their business. I think it’s the data that they have, that they own and they use, and how they compare it with digital marketing, for example, a Google campaign to reach the right consumers at the right time that are in market for their product.
And so I think the goal, obviously, is to get down to the one-to-one marketing level, which is hard to scale, particularly if you have a consumer, a set of consumers who’ve shopped your stores or purchased your product in the past, but rather starting to use, because of the consumer behavior on the web and all of these data points, you can actually start to learn based on these set of consumers who else looks like that.
Andrew Appel: I mean, today, for some of the most sophisticated marketers in the world, you know, doing their optimization twice a year in terms of how they do resource allocation is a step forward versus once a year, and to go from, you know, 12 segments of brand to which they develop a basically a kind of an activation or marketing program once a year, to go to two is considered an advancement.
So I think the most sophisticated retailers are rescoring their files and their score, you know, weekly, and they’re creating thousands of subcharacteristics of their tens of millions of consumers and they’re developing personalized interactions based on these thousands segments on a, you know, adjusted and rolled forward on a weekly basis.
That’s about, you know, in the pure digital world, I suspect a Capital One or something can go a little bit farther because they’re basically not living in the physical world, but in the physical world of CPGs and retails and other companies that actually sell products, that would be highly sophisticated, and everyone wants to get to, you know, kind of a daily, you know, a daily rescored by person, by channel of demand, with an adjustment for context or something.
So that’s, you know, the vision is to basically have an individual score for individual products by individual demand lever that you can then kind of update based on context, right? Am I sitting in front of a hair shelf, what am I likely to buy? If I’m in a [inaudible] or something.
Hal Weitzman: Sanjog Misra?
Sanjog Misra: Yeah so, I think the way I see the points being made is that it’s about matching the cadence of the data with the cadence of the decisions, right? And some, and this heterogeneity that Andrew brings up is a, is an important one.
So there’s gonna be certain industries, retail being one of them, where the cadence of decision-making is just not feasible to go in and change the prices of a hundred thousand SKUs at your supermarket on a daily basis, that’s just not going to happen.
Whereas if you’ve got an online firm, you know, Amazon can go in and tweak their prices in, you know, a fraction of a second, and that’s perfectly plausible. So I think the cadence of the decision-making will kind of lead to the degree to which data and these algorithms will contribute to the bottom line.
That’s not to say that in retail this isn’t important, right? There’s just a different perspective on this. There’s other aspects of things which complicate matters, and there it might be that, look, our decisions are going to be made once a month, or once a quarter, but we have just a larger volume of data and we can take the richness of data to better inform our decision-making process even at that level of cadence.
The point I was making about these micro-segments and about, kind of, this real-time decision-making was in the context where the decision-making cadence is really fast, where you can do things and you want to do things, right? Where that it’s possible to do those things, and I think at some point what’s gonna happen is that you’re gonna get to a place where, just looping back to my idea about experimentation, we’re going to go to a world where you’re not going to be able to rely on just historical data or just experimentation, for the reasons that were brought up. There are biases in historical, observed data, which, human beings generated it, and human beings have biases. There’s no bias in experimental data, but it’s not generally applicable and not scalable for long-term decision-making, we know that.
So essentially what we want is we want some amount of experimental variation to be injected into our data sets moving forward, and I think the way I see kind of the next phase of big data is, like, firms realizing the type of data that they would like to have in the future and making sure that that’s, you know, if we go down a path which will allow us to get that data so that we can make better decisions moving forward. It’s a grand vision, but I think sooner or later we’ll get there.
Hal Weitzman: Well, on that big vision, unfortunately our time is up. My thanks to our panel, Sanjog Misra, Ted Buell, Andrew Appel, and Mike Boush.
For more research, analysis, and commentary, visit us online at review.chicagobooth.edu and join us again next time for another The Big Question.
More from Chicago Booth Review
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.