时间序列数据的ADF检验

2024-07-09 19:08| 来源: 网络整理| 查看: 265

@创建于：20210318 @修改于：20210318

文章目录 1、背景2、单位根检验（Unit Root Test）理论3、python接口解释3.1 adfuller API介绍3.2 参数3.2 输出内容 4、实例化参数介绍4.1 程序实现4.2 运行结果 5、参考资料

1、背景

在Holt、Holt-Winters(ExponentialSmoothing)、ARMA、ARIMA这样的自回归模型中，模型对时间序列数据的平稳是有要求的，因此，需要对数据或者数据的n阶差分进行平稳检验，而一种常见的方法就是ADF检验，即单位根检验。

在数学中，平稳随机过程（Stationary random process）或者严平稳随机过程（Strictly-sense stationary random process），又称狭义平稳过程。

平稳随机过程是在固定时间和位置的概率分布与所有时间和位置的概率分布相同的随机过程，即随机过程的统计特性不随时间的推移而变化，因此数学期望和方差这些参数不随时间和位置变化。

Ref：百度百科平稳随机过程

2、单位根检验（Unit Root Test）理论

单位根检验（Unit Root Test）单位根检验是针对宏观经济数据序列、货币金融数据序列中是否具有某种统计特性而提出的一种平稳性检验的特殊方法，单位根检验的方法有很多种，包括ADF检验、PP检验、NP检验等。 Ref：MBA智库百科

单位根检验的零假设是原序列是非平稳的。

在这里插入图片描述

Ref：单位根检验详解

3、python接口解释 3.1 adfuller API介绍

Ref：官网资料 statsmodels.tsa.stattools.adfuller

pip install statsmodels from statsmodels.tsa.stattools import adfuller 或者 statsmodels.tsa.stattools.adfuller() adfuller( x, maxlag=None, regression="c", autolag="AIC", store=False, regresults=False, )

Ref：如何查看adfuller()函数的模型拟合系数

3.2 参数

x：array_like，1d，要测试的数据系列。 maxlag：测试中包含的最大延迟，默认为12 *（nobs / 100）^ {1/4}。 regression：{‘c’，‘ct’，‘ctt’，‘nc’}，包含在回归中的常量和趋势顺序。‘c’：仅限常量（默认值）。 ‘ct’：恒定和趋势。 ‘ctt’：常数，线性和二次趋势。 ‘nc’：没有恒定，没有趋势。 autolag： {‘AIC’，‘BIC’，‘t-stat’，None}自动确定滞后时使用的方法。如果为None，则使用maxlag滞后。如果是’AIC’（默认值）或’BIC’，则选择滞后数以最小化相应的信息标准。基于’t-stat’的maxlag选择。从maxlag开始并使用5％大小的测试来降低延迟，直到最后一个滞后长度的t统计量显着为止。 store：bool，如果为True，则另外返回adf统计信息的结果实例。默认值为False。 regresults：bool，optional，如果为True，则返回完整的回归结果。默认值为False。

Parameters ---------- x : array_like, 1d The data series to test. maxlag : int Maximum lag which is included in test, default 12*(nobs/100)^{1/4}. regression : {"c","ct","ctt","nc"} Constant and trend order to include in regression. * "c" : constant only (default). * "ct" : constant and trend. * "ctt" : constant, and linear and quadratic trend. * "nc" : no constant, no trend. autolag : {"AIC", "BIC", "t-stat", None} Method to use when automatically determining the lag length among the values 0, 1, ..., maxlag. * If "AIC" (default) or "BIC", then the number of lags is chosen to minimize the corresponding information criterion. * "t-stat" based choice of maxlag. Starts with maxlag and drops a lag until the t-statistic on the last lag length is significant using a 5%-sized test. * If None, then the number of included lags is set to maxlag. store : bool If True, then a result instance is returned additionally to the adf statistic. Default is False. regresults : bool, optional If True, the full regression results are returned. Default is False. 3.2 输出内容

ADF：float，测试统计。 pvalue：float，probability value：MacKinnon基于MacKinnon的近似p值（1994年，2010年）。 usedlag：int，使用的滞后数量。 NOBS：int，用于ADF回归和计算临界值的观察数。 critical values：dict，测试统计数据的临界值为1％，5％和10％。基于MacKinnon（2010）。 icbest：float，如果autolag不是None，则最大化信息标准。 resstore：ResultStore, optional，一个虚拟类，其结果作为属性附加。

Returns ------- adf : float The test statistic. pvalue : float MacKinnon"s approximate p-value based on MacKinnon (1994, 2010). usedlag : int The number of lags used. nobs : int The number of observations used for the ADF regression and calculation of the critical values. critical values : dict Critical values for the test statistic at the 1 %, 5 %, and 10 % levels. Based on MacKinnon (2010). icbest : float The maximized information criterion if autolag is not None. resstore : ResultStore, optional A dummy class with results attached as attributes. 4、实例化参数介绍 4.1 程序实现 # -*- coding:UTF-8 -*- from statsmodels.tsa.stattools import adfuller import numpy as np import pandas as pd seq = np.array([1, 2, 3, 4, 5, 7, 5, 1, 54, 3, 6, 87, 45, 14, 24]) result = adfuller(seq, autolag='AIC') print("\nresult is\n{}".format(result)) result_fromat = pd.Series(result[0:4], index=['Test Statistic','p-value','Lags Used','Number of Observations Used']) for k, v in result[4].items(): result_fromat['Critical Value (%s)' % k] = v result_fromat['The maximized information criterion if autolag is not None.'] = result[5] print("\nresult_fromat is\n{}".format(result_fromat)) print("\n\n===== adfuller()的回归模型系数 =====") [t, p, c, r] = adfuller(x=seq, regression='ctt', regresults=True) print("r.resols.summary() is") print(r.resols.summary()) print("\nr.resols.params are") print(r.resols.params) 4.2 运行结果 result is (-0.012544765454616165, 0.9574822652420663, 5, 9, {'1%': -4.473135048010974, '5%': -3.28988060356653, '10%': -2.7723823456790124}, 84.80988245795896) result_fromat is Test Statistic -0.012545 p-value 0.957482 Lags Used 5.000000 Number of Observations Used 9.000000 Critical Value (1%) -4.473135 Critical Value (5%) -3.289881 Critical Value (10%) -2.772382 The maximized information criterion if autolag is not None. 84.809882 dtype: float64 ===== adfuller()的回归模型系数 ===== C:\ProgramData\Anaconda3\envs\tsp\lib\site-packages\scipy\stats\stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=11 warnings.warn("kurtosistest only valid for n>=20 ... continuing " r.resols.summary() is OLS Regression Results ============================================================================== Dep. Variable: y R-squared: 0.935 Model: OLS Adj. R-squared: 0.838 Method: Least Squares F-statistic: 9.598 Date: Thu, 18 Mar 2021 Prob (F-statistic): 0.0232 Time: 13:02:48 Log-Likelihood: -40.193 No. Observations: 11 AIC: 94.39 Df Residuals: 4 BIC: 97.17 Df Model: 6 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ x1 -13.5130 3.840 -3.519 0.024 -24.176 -2.850 x2 10.1172 3.266 3.098 0.036 1.049 19.185 x3 6.1048 2.185 2.794 0.049 0.039 12.170 x4 2.1014 0.991 2.120 0.101 -0.650 4.853 const 45.0063 25.406 1.771 0.151 -25.532 115.545 x5 -13.2599 10.376 -1.278 0.270 -42.068 15.549 x6 5.8638 2.057 2.850 0.046 0.152 11.575 ============================================================================== Omnibus: 0.004 Durbin-Watson: 1.371 Prob(Omnibus): 0.998 Jarque-Bera (JB): 0.214 Skew: 0.004 Prob(JB): 0.899 Kurtosis: 2.317 Cond. No. 397. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. r.resols.params are [-13.51297728 10.11721623 6.10477404 2.10143413 45.00633172 -13.25988671 5.86376013]

下面的结果是基于==statsmodels 0.12.2==版本运行，与原来连接结果有所不同。 Ref：Python ADF 单位根检验如何查看结果的实现 Ref：如何查看adfuller()函数的模型拟合系数

5、参考资料 Ref：百度百科平稳随机过程Ref：单位根检验详解Ref：官网资料 statsmodels.tsa.stattools.adfullerRef：如何查看adfuller()函数的模型拟合系数Ref：Python ADF 单位根检验如何查看结果的实现 References ---------- .. [1] W. Green. "Econometric Analysis," 5th ed., Pearson, 2003. .. [2] Hamilton, J.D. "Time Series Analysis". Princeton, 1994. .. [3] MacKinnon, J.G. 1994. "Approximate asymptotic distribution functions for unit-root and cointegration tests. `Journal of Business and Economic Statistics` 12, 167-76. .. [4] MacKinnon, J.G. 2010. "Critical Values for Cointegration Tests." Queen"s University, Dept of Economics, Working Papers. Available at http://ideas.repec.org/p/qed/wpaper/1227.html

【本文地址】

公司简介

联系我们