Paper From Unstructured Data to Demand Counterfactuals: Theory and Practice
Empirical models of consumer demand for differentiated products depend on compact representations of product characteristics to capture how consumers substitute between alternatives. Increasingly, researchers construct these representations by applying machine learning methods to high-dimensional, unstructured data sources such as product descriptions and images. However, when these proxy representations fail to capture the true dimensions of product differentiation that drive consumer choice, standard estimation workflows produce biased counterfactual predictions and unreliable statistical inference. This study develops a practical methodological toolkit to correct this bias and restore valid inference across a broad class of counterfactual scenarios relevant to competition policy and market analysis. The proposed approach is flexible in several important respects: it applies to both market-level and individual-level data, requires minimal additional computational burden beyond standard workflows, and delivers straightforward formulas for standard errors. Critically, the framework accommodates data-dependent proxies — including embeddings generated by fine-tuned machine learning models — and can also be applied to conventional quantitative product attributes when measurement error is a concern. In addition to bias correction, the study introduces diagnostic tools that researchers can use to evaluate whether a given proxy adequately captures the relevant dimensions of product differentiation, both in terms of construction quality and dimensionality. The approach is validated through simulation studies and an empirical application, both of which demonstrate meaningful improvements in the accuracy of counterfactual substitution predictions relative to uncorrected baselines. The toolkit offers a principled path forward for researchers seeking to integrate modern machine learning representations into structural demand estimation.
- Authored by
- 2026
- CAAI - Finance