自动化ARIMA时间序列及Python实现（Auto

2024-07-14 14:43| 来源: 网络整理| 查看: 265

自动化ARIMA时间序列及Python实现一、Pmdarima宏包介绍InstallationQuickstart Examples 二、Python代码实现三、导出模型四、优缺点五、Pmdarima下载链接&&遍历赋值（p，q）代码参考文献：之前在准备Mathorcup的时候，觉着题目中的按小时变化的上下行流量呈现波峰波谷周期性的变化，而且大部分数据也具有随着时间迁移的平滑性，就想着使用Arima对这些数据进行建模预测分析。但是Arima建模过程中的参数（p，d，q），在每一个数据集中都是不一样的，相当于就是要对每一个数据集里的数据进行：自相关图->平稳性检验->白噪音检验->模型定阶，balabala一系列操作手动来确定arima（p，d，q）。当然这个过程太艰辛了，所以一开始是想对模型的p，d两个参数进行迭代赋值来确定AIC最小的model（网上也有dalao是把这个代码整出来了的，我放在文章后面），但是太慢了。。。也有可能是我自己的代码算法复杂度高了的原因或者啥的（循环写多了，超级后悔没去打蓝桥杯这些练一下缩减内存或者时间空间复杂度这一块）。想了想Python除了生孩子不行其他啥都能做，于是瞄上了全知全能的第三方库，让我找着了一个黑箱model的宏包，也就是pmdarima。其中内置auto_arima宏包可以自动迭代（p，d，q）三个参数，最终通过穷举所有可能的参数求得最小AIC的model，解决战斗。

一、Pmdarima宏包介绍

这里我也懒得翻译了，大概操作（install或者是import）和origin web这里面都有，自己看。

Pmdarima (originally pyramid-arima, for the anagram of ‘py’ + ‘arima’) is a statistical library designed to fill the void in Python’s time series analysis capabilities. This includes:

The equivalent of R’s auto.arima functionalityA collection of statistical tests of stationarity and seasonalityTime series utilities, such as differencing and inverse differencingNumerous endogenous and exogenous transformers and featurizers, including Box-Cox and Fourier transformationsSeasonal time series decompositionsCross-validation utilitiesA rich collection of built-in time series datasets for prototyping and examplesScikit-learn-esque pipelines to consolidate your estimators and promote productionization

Pmdarima wraps statsmodels under the hood, but is designed with an interface that’s familiar to users coming from a scikit-learn background.

Installation

Pmdarima has binary and source distributions for Windows, Mac and Linux (manylinux) on pypi under the package name pmdarima and can be downloaded via pip:

$ pip install pmdarima Quickstart Examples

Fitting a simple auto-ARIMA on the wineind dataset:

import pmdarima as pm from pmdarima.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt # Load/split your data y = pm.datasets.load_wineind() train, test = train_test_split(y, train_size=150) # Fit your model model = pm.auto_arima(train, seasonal=True, m=12) # make your forecasts forecasts = model.predict(test.shape[0]) # predict N steps into the future # Visualize the forecasts (blue=train, green=forecasts) x = np.arange(y.shape[0]) plt.plot(x[:150], train, c='blue') plt.plot(x[150:], forecasts, c='green') plt.show()

Fitting a more complex pipeline on the sunspots dataset, serializing it, and then loading it from disk to make predictions:

import pmdarima as pm from pmdarima.model_selection import train_test_split from pmdarima.pipeline import Pipeline from pmdarima.preprocessing import BoxCoxEndogTransformer import pickle # Load/split your data y = pm.datasets.load_sunspots() train, test = train_test_split(y, train_size=2700) # Define and fit your pipeline pipeline = Pipeline([ ('boxcox', BoxCoxEndogTransformer(lmbda2=1e-6)), # lmbda2 avoids negative values ('arima', pm.AutoARIMA(seasonal=True, m=12,suppress_warnings=True,trace=True))]) pipeline.fit(train) # Serialize your model just like you would in scikit: with open('model.pkl', 'wb') as pkl: pickle.dump(pipeline, pkl) # Load it and make predictions seamlessly: with open('model.pkl', 'rb') as pkl: mod = pickle.load(pkl) print(mod.predict(15)) # [25.20580375 25.05573898 24.4263037 23.56766793 22.67463049 21.82231043 # 21.04061069 20.33693017 19.70906027 19.1509862 18.6555793 18.21577243 # 17.8250318 17.47750614 17.16803394] 二、Python代码实现

当然上面也有举例，But还是大概拆分介绍一下这个model各个参数的调整已经函数的使用。

from pmdarima.arima import auto_arima model1=auto_arima(data_low,start_p=1,start_q=1,max_p=3,max_q=3,m=12,start_P=0,seasonal=True,d=1,D=1,trace = True,error_action ='ignore',suppress_warnings = True,stepwise=True) model1.fit(data_low) data_low：这是我的训练集start_p：p参数迭代的初始值max_p：p参数迭代的最大值seasonal：季节性trace：平滑stepwise：显示运行过程三、导出模型

这里也提一下如何save model，这里选择引用joblib宏包里的model保存功能（注：这个宏包以前是包含在sklearn宏包里的一项功能，后来我更新anaconda的时候好像顺手更新了sklearn，这个功能就只能单独引用了）。

import joblib joblib.dump(model2,'model_save/'+str(i)+'.pkl') 四、优缺点

优点大概就是：

相对于写自己写循环不优化算法而言这个方法真的快多了全自动不解释，真的做到解放双手，电脑无脑跑

缺点也还蛮多：

一个model的平均训练时长大概是1min左右无法通过识别白噪音点，选取降阶的model模型（还是转不过弯？) 五、Pmdarima下载链接&&遍历赋值（p，q）代码

pmdarima宏包：https://download.csdn.net/download/weixin_45839604/14075267

遍历赋值算法结构：

from statsmodels.tsa.arima_model import ARIMA pmax = int(len(data['low_GB'])/10) #一般阶数不超过 length /10 qmax = int(len(data['on_GB'])/10) bic_matrix = [] for p in range(pmax+1): temp= [] for q in range(qmax+1): try: temp.append(ARIMA(data_on,(p, 1, q)).fit().bic) except: temp.append(None) bic_matrix.append(temp) bic_matrix = pd.DataFrame(bic_matrix) #将其转换成Dataframe 数据结构 p,q = bic_matrix.stack().astype('float64').idxmin() #先使用stack 展平，然后使用 idxmin 找出最小值的位置 print(u'BIC 最小的p值和 q 值：%s,%s' %(p,q)) # BIC 最小的p值和 q 值：0,1 #所以可以建立ARIMA 模型，ARIMA(0,1,1) model = ARIMA(data_on,(p,1,q)).fit() # model.summary2() #生成一份模型报告 # model.forecast(5) #为未来5天进行预测，返回预测结果，标准误差，和置信区间参考文献：

[1] Pmdarima wraps statsmodels

【本文地址】

公司简介

联系我们