波动率预测：日内数据显著提升预测精度

量化投资与机器学习微信公众号

发布于 2023-04-20 09:53:39

1.3K0

核心观点

本文在波动率预测的问题上，相对于复杂的时间序列模型，提供了一种简化的建模方法。
更复杂的模型，如自回归条件异方差(ARCH)和广义ARCH (GARCH)，在以简单估计器为基准时，并没有提高美国股市的预测精度。期权隐含波动率数据的使用仅略微改善预测。
基于负收益和日内数据的模型与基准模型相比，在预测准确性方面有显著的提高。

正文

对于波动率的预测和度量，有非常多的方法。有些模型使用预先指定的估计量，例如，GARCH模型。此外，还有一些模型来源于金融工程领域，如随机波动率模型。大多数知名的波动率模型应用于不同的场景，这使得人们很难理解简单的滞后波动率、GARCH模型、隐含波动率模型和指数方差模型的差异。

本文试图提高对波动率预测问题的理解，统一了金融和计量时间序列模型之间的符号，并阐明了它们之间的联系。研究表明，常方差模型是白噪声模型，实现方差模型是限制性AR(p)模型，指数方差模型是限制性ARMA(p,q)模型，ARCH模型是AR(p)模型，GARCH模型是ARMA(p,q)模型。本文的实证部分比较了10种模型在联合数据集上的预测精度，并进行了基准分析。为了控制复杂性风险和过拟合风险，所有模型都选择了三个或三个以下的估计参数。值得注意的是，该研究发现，复杂性不会提高预测。ARCH模型等突出的学术模型的性能低于简单的滞后方差模型，而GARCH模型尽管增加了调整参数的数量，但只提供了微不足道的改进。我们还发现，其他模型，如隐含波动率信息模型、非对称模型和季节性模型，与简单的指数方差模型或历史方差模型相比，几乎没有表现出相对的改善。只有基于负收益和日内数据的模型与基准模型相比，在预测准确性方面有显著的提高。

首先，我们定义，在时间t=1, ..., T期间，时变资产收益的波动率为：

\sigma(t)=\sqrt{\operatorname{VAR}\left[r_t\right]}

其中VAR表示方差，即：

\operatorname{VAR}\left[r_t\right]=E\left[r_t^2\right]-E\left[r_t\right]^2

当资产收益率期望为0时，上式简化为：

\operatorname{VAR}\left[r_t\right]=E\left[y_t\right]

其中，那么此时预测波动率的问题就相当于一个普通的时间序列预测的问题，我们可以使用简单的如普通二乘法的回归模型，也可以使用复杂的机器学习模型，比如下列时间序列模型：

White noise:

y_t=a+\varepsilon_t

Auto-regressive

A R(p)

model:

y_t=a+\sum_{i=1}^p b_i y_{t-i}+\varepsilon_t

Moving-average MA(q) model:

y_t=a+\sum_{i=1}^q c_i \varepsilon_{t-i}+\varepsilon_t

Auto-regressive moving average

\operatorname{ARMA}(p, q)

model:

y_t=a+\sum_{i=1}^p b_i y_{t-i}

+\sum_{i=1}^q c_i \varepsilon_{t-i}+\varepsilon_t

ARMAX

(p, q, m)

model to include

exogenous predictor variables

X_{i, t}

Any other function:

y_t=f(x, y)+\varepsilon_t

With errors

\varepsilon_t=y_t-E\left[y_t\right], E\left[\varepsilon_t\right]=0

这足以以一种简单的方式经验地预测波动，避免纯粹的大型金融建模的消耗。预测的质量将主要取决于预测变量（自变量）的预测能力，而不是模型的复杂性。当我们以时间序列预测的问题拆解常见的波动率预测模型时，可以发现每一个模型都能找到对应的时序模型：

Constant variance

\sigma_t^2=\sigma^2

is a white-noise model with

\sigma^2=a

Realized variance

\sigma_t^2=1 / N \times \sum_{i=1}^n r_{t-i}^2

, with

being the number of trailing observations, is a restricted form of an

\operatorname{AR}(\mathrm{p})

model with

a=0, b_i=\frac{1}{p}, i=1, \ldots, p

Exponential variance

\sigma_t^2=\lambda \sigma_{t-1}^2+(1-\lambda) r_{t-1}^2

with weighting factor

0 \leq \lambda \leq 1

is an

\operatorname{ARMA}(p, q)

case with

a=0, p=1, b=1, q=1, c=-\lambda

A R C H(m)

model

\sigma_t^2=\omega+\sum_{i=1}^m \alpha_i r_{t-i}^2

is an AR(p) model with

m=p, \omega=a, \alpha_i=b_i

\operatorname{GARCH}(m, n) \sigma_t^2=\omega+\Sigma_{i=1}^m \alpha_i r_{t-i}^2+\Sigma_{i=1}^n \beta_i \sigma_{t-i}^2

is an

\operatorname{ARMA}(p, q)

model with

m=p, n=q, \omega=a, \alpha_i=b_i, \beta_i=c_i

接下来，本文使用S&P500指数2000年至2020年期间的日度收益率数据对上述多个模型进行实证分析，其中2000-2016年的数据用作样本内数据计算最有参数（最小化MSE），2016-2020的数据用作样本外数据，对模型效果进行验证。一共验证了以下10个模型：

White noise/Constant variance:

y_t=a+\varepsilon_t

Restricted

A R(p) /

Realized variance:

y_t=a+\frac{1}{p} \Sigma_{i=1}^p b_i y_{t-i}+\varepsilon_t

Restricted ARMA(1,1)/Exponential variance:

y_t=y_{t-1}+c \varepsilon_{t-1}+\varepsilon_t

\mathrm{AR}(\mathrm{p}) / \mathrm{ARCH}: y_t=a+\sum_{i=1}^p b_i y_{t-i}+\varepsilon_t

\operatorname{ARMA}(p, q) / \mathrm{GARCH}: y_t=a+\sum_{i=1}^p b_i y_{t-i}+\Sigma_{i=1}^q c_i \varepsilon_{t-i}+\varepsilon_t

Furthermore, we add five models to have parsimonious cases of the general model class

y_t=f(x, y)+\varepsilon_t

, with

being exogenous predictive variables

Asymmetric exponential variance:

y_t=\left\{\begin{array}{c}y_{t-1}+c^u \varepsilon_{t-1}+\varepsilon_t, \text { if } r_t \geq 0 \\ y_{t-1}+c^d \varepsilon_{t-1}+\varepsilon_t, \text { if } r_t<0\end{array}\right.

rent week number of year minus 26 .

Implied volatility:

y_t=a+d \times I V_{t-1}^2+\varepsilon_t

, with

I_t^2=\lambda^{R V} I V_{t-1}^2+\left(1-\lambda^N\right) \times i V_t^2

, with

i v_t

being the at-the-money implied volatility of S&P 500 options at close of day

and

I V_t^2

the exponentially weighted average of single-day implied variances. The daily time series of implied volatilities is sourced from Bloomberg. Using option implied volatilities for forecasting has been studied before with mixed results. For instance, Jorion (1995) tested currency option volatilities without finding a marginal added value while Blair, Poon, and Taylor (2001) found that option implied data provides the best forecast accuracy among their set of tested alternatives. Christensen and Prabhala (1998) showed for monthly equity market data that option implied information improves volatility forecasts relative to pure historical ones. Poon and Granger's (2005) meta-analysis found that most empirical studies indicate that option implied data models dominate pure time-series models.

Negative momentum:

y_t=a+d \times \operatorname{mom}_{t-1}^2+\varepsilon_t

, with mom

_t=\lambda^{\text {mom }}

mom

_{t-1}+

\left(1-\lambda^{\text {mom }}\right) r_t \times I_{\left\{r_t<0\right\}}

denoting the exponentially weighted past average negative market return.

Intraday:

y_t=a+d \times R V_{t-1}+\varepsilon_t

, with

R V_t=\lambda^{R V} R V_{t-1}+\left(1-\lambda^{R V}\right) r v_t

, with

r v_t

being the realized variance of 10-minute returns on day

and

R V_t

the exponentially weighted average of single-day variances. The 10-minute-wise S&P 500 Index time series is provided by Refinitiv. Previously, Blair, Poon, and Taylor (2001) tested the value of intraday returns for forecasting S&P 100 volatility and found it to be insignificant.

使用样本内数据对各模型校准后得到的参数如下表所示：

图2展示了各模型在样本外的预测，可以发现除了constant模型和纯季节性模型，其他模型对于波动预测的整体走势比较相似。其中，ARCH模型及intra-day日内模型的预测噪音较高，因为只使用了非常有限的跟踪数据。

下图3给出了预测精度方面的结果。总体而言，与固定方差估计相比，大多数模型的预测精度提高了约25%。Realized Variance、Asymmetric Exp Variance、ARCH(2)、GARCH(1,1)和隐含波动率都具有几乎相同的预测能力。Negative Momentum和基于日内数据的intraday预测模型，分别提高了32%和34%的预测精度，是表现最好的两个模型。