Faculty & Research

Dacheng Xiu

Associate Professor of Econometrics and Statistics

Phone :
Address :
5807 South Woodlawn Avenue
Chicago, IL 60637

Dacheng Xiu’s research interests include developing statistical methodologies and applying them to financial data, while exploring their economic implications. His earlier research involved risk measurement and portfolio management with high-frequency data and econometric modeling of derivatives. His current work focuses on developing machine learning solutions to big-data problems in empirical asset pricing.

Xiu’s work has appeared in Econometrica, the Journal of Econometrics, and the Journal of the American Statistical Association. Additionally, he was invited to publish in the Journal of Business and Economic Statistics. He is an Associate Editor for the Journal of Econometrics and Statistica Sinica, and also referees for several journals in the fields of econometrics, statistics, and finance. Xiu has presented his work at various conferences and university seminars. His paper “Inference on Risk Premia in the Presence of Omitted Factors” received the Best Conference Paper Prize at the 2017 Annual Meeting of the European Finance Association.

In 2017, Xiu launched a website that provides up-to-date realized volatilities of individual stocks. These daily volatilities are calculated from the stocks’ intraday transactions and the methodologies are based on his research of high-frequency data.

Xiu earned his PhD and MA in applied mathematics from Princeton University, where he was also a researcher at the Bendheim Center for Finance. Prior to his graduate studies, he obtained a BS in mathematics from the University of Science and Technology of China.


2017 - 2018 Course Schedule

Number Name Quarter
41100 Applied Regression Analysis 2018 (Winter)
41600 Econometrics and Statistics Colloquium 2018 (Spring)
41902 Statistical Inference 2018 (Winter)

Other Interests

Skiing, swimming, diving, basketball, and photography


Research Activities

Financial Econometrics, Statistics, Empirical Asset Pricing, and Quantitative Finance

"Hermite Polynomial Based Expansion of European Option Prices." Dacheng Xiu; Journal of Econometrics, 2014, 179(2), pp. 158-77.

"Quasi-Maximum Likelihood Estimation of GARCH Models with Heavy-Tailed Likelihoods." Jianqing Fan, Lei Qi and Dacheng Xiu; Journal of Business & Economic Statistics, 2013, 32(2), pp. 178-91.

"Likelihood-Based Volatility Estimators in the Presence of Market Microstructure Noise," Yacine Aït-Sahalia and Dacheng Xiu, in L. Bauwens, C. Hafner and S. Laurent: Handbook of Volatility Models and Their Applications. Hoboken, New Jersey: John Wiley & Sons, Inc., 2012, pp. 347-61

"Quasi-Maximum Likelihood Estimation of Volatility with High Frequency Data." Dacheng Xiu; Journal of Econometrics, 2010, 159(1), pp. 235-50.

"High-Frequency Covariance Estimates with Noisy and Asynchronous Financial Data." Yacine Aït-Sahalia, Jianqing Fan and Dacheng Xiu; Journal of the American Statistical Association, 2010, 105(492), pp. 1504-17.

For a listing of research publications, please visit the university library listing page.

REVISION: Knowing Factors or Factor Loadings, or Neither? Evaluating Estimators of Large Covariance Matrices with Noisy and Asynchronous Data
Date Posted: Nov  01, 2017
We investigate estimators of factor-model-based large covariance (and precision) matrices using high-frequency data, which are asynchronous and potentially contaminated by the market microstructure noise. Our estimation strategies rely on the pre-averaging method with refresh time to solve the microstructure problems, while using three different specifications of factor models with a variety of thresholding methods, respectively, to battle the curse of dimensionality. To estimate a factor model, we either adopt the time-series regression (TSR) to recover loadings if factors are known, or use the cross-sectional regression (CSR) to recover factors from known loadings, or use the principal component analysis (PCA) if neither factors nor their loadings are assumed known. We compare the convergence rates in these scenarios using the joint in-fill and increasing dimensionality asymptotics. To evaluate the empirical trade-off between robustness to model misspecification and statistical ...

New: When Moving-Average Models Meet High-Frequency Data: Uniform Inference on Volatility
Date Posted: Sep  29, 2017
In this paper, we propose a general framework of volatility inference with noisy high-frequency data. We assume the observed transaction price follows a continuous-time Itˆo-semimartingale, contaminated by a discrete-time moving-average noise process associated with the arrival of trades. Our estimator is obtained by maximizing the likelihood of a misspecified moving-average model of returns with homoscedastic innovations. We show that this quasi-likelihood estimator is consistent with respect to the quadratic variation of the semimartingale, and that the estimator is asymptotically mixed normal. We also present the minimax optimal bound, from which our estimator deviates slightly. Next, we propose AIC and BIC for order selection, and establish uniformly valid inference on volatility, allowing for model selection mistakes. In addition, our estimator is adaptive to the presence of the noise, and its convergence rate varies from n1/4 to n1/2 ...

REVISION: Taming the Factor Zoo
Date Posted: Sep  07, 2017
The asset pricing literature has produced hundreds of potential risk factors. Organizing this “zoo of factors" and distinguishing between useful, useless, and redundant factors require econometric techniques that can deal with the curse of dimensionality. We propose a model-selection method to systematically evaluate the contribution to asset pricing of any new factor, above and beyond what a high-dimensional set of existing factors explains. Our methodology explicitly accounts for potential model-selection mistakes, unlike the standard approaches that assume perfect variable selection, which rarely occurs in finite sample and produces a bias due to the omitted variables. We apply our procedure to a set of factors recently discovered in the literature. We show that several factors - such as profitability and investments - have statistically significant explanatory power beyond the hundreds of factors proposed in the past. In addition, we show that our risk price estimates and their ...

REVISION: Principal Component Analysis of High Frequency Data
Date Posted: Aug  22, 2017
We develop the necessary methodology to conduct principal component analysis at high frequency. We construct estimators of realized eigenvalues, eigenvectors, and principal components and provide the asymptotic distribution of these estimators. Empirically, we study the high frequency covariance structure of the constituents of the S&P 100 Index using as little as one week of high frequency data at a time. The explanatory power of the high frequency principal components varies over time. During the recent financial crisis, the first principal component becomes increasingly dominant, explaining up to 60% of the variation on its own, while the second principal component drives the common variation of financial sector stocks.

REVISION: Resolution of Policy Uncertainty and Sudden Declines in Volatility
Date Posted: Aug  22, 2017
We introduce downward volatility jumps into a general non-affine modeling framework of the term structure of variance. With variance swaps and S&P 500 returns, we find that downward volatility jumps are associated with a resolution of policy uncertainty, mostly through statements from FOMC meetings and speeches of the Federal Reserve's chairman. Ignoring such jumps may lead to an incorrect interpretation of the tail events, and hence biased estimates of variance risk premia. On the modeling side, we explore the structural differences and relative goodness-of-fits of factor specifications. We find that log-volatility models with at least one Ornstein-Uhlenbeck factor and double-sided jumps are superior in capturing volatility dynamics and pricing variance swaps, compared to the affine model prevalent in the literature or non-affine specifications without downward jumps.

New: A Tale of Two Option Markets: Pricing Kernels and Volatility Risk
Date Posted: May  31, 2017
Using prices of both S&P 500 options and recently introduced VIX options, we study asset pricing implications of volatility risk. While pointing out the joint pricing kernel is not identified nonparametrically, we propose model-free estimates of marginal pricing kernels of the market return and volatility conditional on the VIX. We find that the pricing kernel of market return exhibits a decreasing pattern given either a high or low VIX level, whereas the unconditional estimates present a U-shape. Hence, stochastic volatility is the key state variable responsible for the U-shape puzzle documented in the literature. Finally, our estimates of the volatility pricing kernel feature a U-shape, implying that investors have high marginal utility in both high and low volatility states.

REVISION: Inference on Risk Premia in the Presence of Omitted Factors
Date Posted: May  20, 2017
We propose a three-pass method to estimate the risk premia of observable factors in a linear asset pricing model, which is valid even when the observed factors are just a subset of the true factors that drive asset prices or they are measured with error. We show that the risk premium of a factor can be identified in a linear factor model regardless of the rotation of the other control factors as long as they together span the space of true factors. Motivated by this rotation invariance result, our approach uses principal components to recover the factor space and combines the estimated principal components with each observed factor to obtain a consistent estimate of its risk premium. Our methodology also accounts for potential measurement error in the observed factors and detects when such factors are spurious or even useless. The methodology exploits the blessings of dimensionality, and we therefore apply it to a large panel of equity portfolios to estimate risk premia for several ...

REVISION: Efficient Estimation of Integrated Volatility Functionals via Multiscale Jacknife
Date Posted: Apr  05, 2017
We propose semi-parametrically efficient estimators for general integrated volatility functionals of multivariate semimartingale processes. It is known that a plug-in method that uses nonparametric estimates of spot volatilities induces high-order biases which need to be corrected to obey a central limit theorem. Such bias terms arise from boundary effects, the diffusive and jump movements of stochastic volatility, and the sampling error from the nonparametric spot volatility estimation. We propose a novel jackknife method for bias-correction. The jackknife estimator is simply formed as a linear combination of a few uncorrected estimators associated with different local window sizes used in the estimation of spot volatility. We show theoretically that our estimator is asymptotically mixed Gaussian, semi-parametrically efficient, and more robust to the choice of local windows. To facilitate the practical use, we introduce a simulation-based estimator of the asymptotic variance, so ...

REVISION: A Hausman Test for the Presence of Market Microstructure Noise in High Frequency Data
Date Posted: Feb  21, 2017
We develop tests that help assess whether a high frequency data sample can be treated as reasonably free of market microstructure noise at a given sampling frequency for the purpose of implementing high frequency volatility and other estimators. The tests are based on the Hausman principle of comparing two estimators, one that is efficient but not robust to the deviation being tested, and one that is robust but not as efficient. We investigate the asymptotic properties of the test statistic in a general nonparametric setting, and compare it with several alternatives that are also developed in the paper. Empirically, we find that improvements in stock market liquidity over the past decade have increased the frequency at which simple, uncorrected, volatility estimators can be safely employed.

REVISION: Econometric Analysis of Multivariate Realised QML: Estimation of the Covariation of Equity Prices under Asynchronous Trading
Date Posted: Dec  05, 2016
Estimating the covariance between assets using high frequency data is challenging due to market microstructure effects and asynchronous trading. In this paper we develop a multivariate realised quasi maximum likelihood (QML) approach, carrying out inference as if the observations arise from an asynchronously observed vector scaled Brownian model observed with error. Under stochastic volatility the resulting realised QML estimator is positive definite, uses all available data, is consistent and asymptotically mixed normal. The quasi-likelihood is computed using a Kalman filter and optimised using a relatively simple EM algorithm. We also propose an alternative estimator using a factor model, which scales well with the number of assets. We derive the theoretical properties of these estimators and prove that they achieve the efficient rate of convergence. Our estimators are also analysed using Monte Carlo methods and applied to equity data with varying levels of liquidity.

REVISION: Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data
Date Posted: Oct  11, 2016
This paper constructs an estimator for the number of common factors in a setting where both the sampling frequency and the number of variables increase. Empirically, we document that the covariance matrix of a large portfolio of US equities is well represented by a low rank common structure with sparse residual matrix. When employed for out-of-sample portfolio allocation, the proposed estimator largely outperforms the sample covariance estimator.

REVISION: Nonparametric Estimation of the Leverage Effect: A Trade-off between Robustness and Efficiency
Date Posted: Nov  24, 2015
We consider two new approaches to nonparametric estimation of the leverage effect. The first approach uses stock prices alone. The second approach uses the data on stock prices as well as a certain volatility instrument, such as the CBOE volatility index (VIX) or the Black-Scholes implied volatility. The theoretical justification for the instrument-based estimator relies on a certain invariance property, which can be exploited when high frequency data is available. The price-only estimator is more robust since it is valid under weaker assumptions. However, in the presence of a valid volatility instrument, the price-only estimator is inefficient as the instrument-based estimator has a faster rate of convergence. We consider two empirical applications, in which we study the relationship between the leverage effect and the debt-to-equity ratio, credit risk, and illiquidity.

REVISION: Incorporating Global Industrial Classification Standard into Portfolio Allocation: A Simple Factor-Based Large Covariance Matrix Estimator with High Frequency Data
Date Posted: Apr  22, 2015
We document a striking block-diagonal pattern in the factor model residual covariances of the S&P 500 Equity Index constituents, after sorting the assets by their assigned Global Industry Classification Standard (GICS) codes. Cognizant of this structure, we propose combining a location-based thresholding approach based on sector inclusion with the Fama-French and SDPR sector Exchange Traded Funds (ETF’s). We investigate the performance of our estimators in an out-of-sample portfolio allocation study. We find that our simple and positive-definite covariance matrix estimator yields strong empirical results under a variety of factor models and thresholding schemes. Conversely, we find that the Fama-French factor model is only suitable for covariance estimation when used in conjunction with our proposed thresholding technique. Theoretically, we provide justification for the empirical results by jointly analyzing the in-fill and diverging dimension asymptotics.

REVISION: A Tale of Two Option Markets: Pricing Kernels and Volatility Risk
Date Posted: Apr  15, 2015
Using both S&P 500 option and recently introduced VIX option prices, we study pricing kernels and their dependence on multiple volatility factors. We first propose nonparametric estimates of marginal pricing kernels, conditional on the VIX and the slope of the variance swap term structure. Our estimates highlight the state-dependence nature of the pricing kernels. In particular, conditioning on volatility factors, the pricing kernel of market returns exhibit a downward sloping shape up to the extreme end of the right tail. Moreover, the volatility pricing kernel features a striking U-shape, implying that investors have high marginal utility in both high and low volatility states. This finding on the volatility pricing kernel presents a new empirical challenge to both existing equilibrium and reduced-form asset pricing models of volatility risk. Finally, using a full-fledged parametric model, we recover the joint pricing kernel, which is not otherwise identifiable.

New: Generalized Method of Integrated Moments for High-Frequency Data
Date Posted: Feb  06, 2015
We propose a semiparametric two-step inference procedure for a finite-dimensional parameter based on moment conditions constructed from high-frequency data. The population moment conditions take the form of temporally integrated functionals of state-variable processes that include the latent stochastic volatility process of an asset. In the first step, we nonparametrically recover the volatility path from high-frequency asset returns. The nonparametric volatility estimator is then used to form sample moment functions in the second-step GMM estimation, which requires the correction of a high-order nonlinearity bias from the first step. We show that the proposed estimator is consistent and asymptotically mixed Gaussian and propose a consistent estimator for the conditional asymptotic variance. We also construct a Bierens-type consistent specification test. These infill asymptotic results are based on a novel empirical-process-type theory for general integrated functionals of noisy ...

New: Increased Correlation Among Asset Classes: Are Volatility or Jumps to Blame, or Both?
Date Posted: Apr  16, 2014
We develop estimators and asymptotic theory to decompose the quadratic covariation between two assets into its continuous and jump components, in a manner that is robust to the presence of market microstructure noise. Using high frequency data on different assets classes, we find that the recent financial crisis led to an increase in both the quadratic variations of the assets and their correlations. However, we find little evidence to suggest a change between the relative contributions of the Brownian and jump components, as both comove. Co-jumps stem from surprising news announcements that occur primarily before the opening of the U.S. market, and are also accompanied by an increase in Brownian-driven correlations.

REVISION: Hermite Polynomial Based Expansion of European Option Prices
Date Posted: Dec  20, 2013
We seek a closed-form series approximation of European option prices under a variety of diffusion models. The proposed convergent series are derived using the Hermite polynomial approach. Departing from the usual option pricing routine in the literature, our model assumptions have no requirements for affine dynamics or explicit characteristic functions. Moreover, convergent expansions provide a distinct insight into how and on which order the model parameters affect option prices, in contrast with small-time asymptotic expansions in the literature. With closed-form expansions, we explicitly translate model features into option prices, such as mean-reverting drift and self-exciting or skewed jumps. Numerical examples illustrate the accuracy of this approach and its advantage over alternative expansion methods.

New: Spot Variance Regressions
Date Posted: Feb  12, 2013
We study a nonlinear vector regression model for discretely sampled high-frequency data with the latent spot variance of an asset as a covariate. We propose a two-stage inference procedure by first nonparametrically recovering the volatility path from asset returns and then conducting inference based on the generalized method of moments (GMM). The GMM estimator is nonstandard in that the second-order asymptotics is dominated by a bias term, rendering the standard inference implausible. We ...

REVISION: Quasi Maximum Likelihood Estimation of GARCH Models with Heavy-Tailed Likelihoods
Date Posted: Aug  06, 2012
The non-Gaussian maximum likelihood estimator is frequently used in GARCH models with the intention of capturing the heavy-tailed returns. However, unless the parametric likelihood family contains the true likelihood, the estimator is inconsistent due to density misspecification. To correct this bias, we identify an unknown scale parameter that is critical to the identification, and propose a two-step quasi maximum likelihood procedure with non-Gaussian likelihood functions. This novel ...

New: High Frequency Covariance Estimates with Noisy and Asynchronous Financial Data
Date Posted: Jun  28, 2010
This paper proposes a consistent and efficient estimator of the high frequency covariance (quadratic covariation) of two arbitrary assets, observed asynchronously with market microstructure noise. This estimator is built upon the marriage of the quasi-maximum likelihood estimator of the quadratic variation and the proposed Generalized Synchronization scheme. It is therefore not influenced by the Epps effect. Moreover, the estimation procedure is free of tuning parameters or bandwidths and ...

REVISION: Quasi-Maximum Likelihood Estimation of Volatility with High Frequency Data
Date Posted: Jun  27, 2010
This paper investigates the properties of the well-known maximum likelihood estimator in the presence of stochastic volatility and market microstructure noise, by extending the classic asymptotic results of quasi-maximum likelihood estimation. When trying to estimate the integrated volatility and the variance of noise, this parametric approach remains consistent, efficient and robust as a quasi-estimator under misspecified assumptions. Moreover, it shares the model-free feature with ...