【python代码开盖即饮】使用PCA降维并使用SVR、GP(高斯过程)、LR(罗技回归)进行分类 | 您所在的位置:网站首页 › 美国纽约大学有哪些 › 【python代码开盖即饮】使用PCA降维并使用SVR、GP(高斯过程)、LR(罗技回归)进行分类 |
1. PCA数据降维 降维就是一种对高维度特征数据预处理方法。降维是将高维度的数据保留下最重要的一些特征,去除噪声和不重要的特征,从而实现提升数据处理速度的目的。在实际的生产和应用中,降维在一定的信息损失范围内,可以为我们节省大量的时间和成本。降维也成为应用非常广泛的数据预处理方法。 降维具有如下一些优点: 使得数据集更易使用。降低算法的计算开销。去除噪声。使得结果容易理解。降维的算法有很多,比如奇异值分解(SVD)、主成分分析(PCA)、因子分析(FA)、独立成分分析(ICA)。 详见:主成分分析(PCA)原理详解 - 知乎 (zhihu.com) 下面使用python实现PCA: from sklearn.decomposition import PCA def PCA_com(data): pca = PCA(n_components=30) # 将维度降低到30 dimension = pca.fit_transform(data).shape[1] Data_pca = PCA(n_components=dimension) data_pca = Data_pca.fit_transform(data) return data_pcaPCA函数详情请见官网sklearn.decomposition.PCA — scikit-learn 1.2.2 documentation 2. SVR、GP、LR回归分析回归就是一种拟合,把孤立的东西联系起来,找到规律,这就叫回归。 SVR、GP、LR的介绍不在此赘述 简单介绍所用到的函数中的重要参数 参考官网:sklearn.svm.SVR — scikit-learn 1.2.2 documentation SVR: kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable, default=’rbf’ 内核,不同的内核拟合效果不同,如果不知道哪个核最好,可以for循环全试一遍。默认高斯核(rbf) gamma{‘scale’, ‘auto’} or float, default=’scale’ Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. if (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma,gamma='scale'if ‘auto’, uses 1 / n_featuresif float, must be non-negative.核系数gamma,有三种选择,默认scale。 Cfloat, default=1.0 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. C是一个正则项系数,与正则化强度成反比关系,也就是说,C越大,模型拟合效果越好但泛化能力越差。 下面是代码实现: import numpy as np from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.svm import SVR def SVR_rbf(data1, score1):# 这里的data1指的是索要拟合的数据,直观一点可以想象成自变量x,score1是拟合的目标值,直观一点就是因变量。 CC = [0.01, 0.1, 1] # 设置超参C,分别测试准确率 data1_train, score_train, data1_test, score_test = train_test_split(data1, score1, test_size=0.2) # 划分训练集和测试集 for C in CC: aver_acc = 0 pre_list = [] svr_rbf = SVR(kernel='rbf', gamma=1 / 30, C=C) svr_rbf.fit(data1_train, score_train) # 使用训练数据来学习(拟合),不需要返回值,训练的结果都在对象内部变量中 test_predict = svr_rbf.predict(data1_test) pre_list.append(test_predict) # 在这里已经得到了预测结果,如果要进行回归分析,可以计算MSE、相关系数等。 # 以下部分是将回归分析结果通过手动划分阈值进行分类。 (阈值需要手动修改,默认1.5) labs_gt = list() labs_pre = list() cnt = 0 for score in test_predict: if score > 1.50: labs_pre.append(1) else: labs_pre.append(0) for score in score_test: if score > 1.50: labs_gt.append(1) else: labs_gt.append(0) for i in range(len(test_predict)): if labs_gt[i] == labs_pre[i]: cnt += 1 aver_acc += cnt / len(labs_pre) print('C = ', C) print('acc = ', aver_acc) def GP(data1, score1): kernel = DotProduct() + WhiteKernel() data1_train, score_train, data1_test, score_test = train_test_split(data1, score1, test_size=0.2) aver_acc = 0 pre_list = [] gpr = GaussianProcessRegressor(kernel=kernel, random_state=0).fit(data1_train, score_train) test_predict = gpr.predict(data1_test) pre_list.append(test_predict) labs_pre = list() labs_gt = list() cnt = 0 for score in test_predict: if score > 1.50: labs_pre.append(1) else: labs_pre.append(0) for score in score_test: if score > 1.50: labs_gt.append(1) else: labs_gt.append(0) for i in range(len(test_predict)): if labs_gt[i] == labs_pre[i]: cnt += 1 aver_acc += cnt / len(labs_pre) print('acc = ', aver_acc) def LR(data1, score1): data1_train, score_train, data1_test, score_test = train_test_split(data1, score1, test_size=0.2) aver_acc = 0 pre_list = [] score_train = [int(i) for i in score_train] clf = LogisticRegression(penalty="l2", C=1.0, random_state=None, solver="lbfgs", max_iter=3000,multi_class='ovr', verbose=0) clf.fit(data1_train, score_train) test_predict = clf.predict(data1_test) pre_list.append(test_predict) labs_pre = list() labs_gt = list() cnt = 0 for score in test_predict: if score > 1.50: labs_pre.append(1) else: labs_pre.append(0) for score in score_test: if score > 1.50: labs_gt.append(1) else: labs_gt.append(0) for i in range(len(test_predict)): if labs_gt[i] == labs_pre[i]: cnt += 1 aver_acc += cnt / len(labs_pre) print('acc = ', aver_acc) 3. 代码分享下面分享所有代码: import numpy as np import pickle from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel from sklearn.linear_model import LogisticRegression from sklearn.decomposition import PCA from sklearn.model_selection import train_test_split from sklearn.svm import SVR def SVR_rbf(data1, score1): CC = [0.01, 0.1, 1] # 设置超参C,分别测试准确率 data1_train, score_train, data1_test, score_test = train_test_split(data1, score1, test_size=0.2) # 划分训练集和测试集 for C in CC: aver_acc = 0 pre_list = [] svr_rbf = SVR(kernel='rbf', gamma=1 / 30, C=C) svr_rbf.fit(data1_train, score_train) # 使用训练数据来学习(拟合),不需要返回值,训练的结果都在对象内部变量中 test_predict = svr_rbf.predict(data1_test) pre_list.append(test_predict) # 在这里已经得到了预测结果,如果要进行回归分析,可以计算MSE、相关系数等。 # 以下部分是将回归分析结果通过手动划分阈值进行分类。 (阈值需要手动修改,默认1.5) labs_gt = list() labs_pre = list() cnt = 0 for score in test_predict: if score > 1.50: labs_pre.append(1) else: labs_pre.append(0) for score in score_test: if score > 1.50: labs_gt.append(1) else: labs_gt.append(0) for i in range(len(test_predict)): if labs_gt[i] == labs_pre[i]: cnt += 1 aver_acc += cnt / len(labs_pre) print('C = ', C) print('acc = ', aver_acc) def GP(data1, score1): kernel = DotProduct() + WhiteKernel() data1_train, score_train, data1_test, score_test = train_test_split(data1, score1, test_size=0.2) aver_acc = 0 pre_list = [] gpr = GaussianProcessRegressor(kernel=kernel, random_state=0).fit(data1_train, score_train) test_predict = gpr.predict(data1_test) pre_list.append(test_predict) labs_pre = list() labs_gt = list() cnt = 0 for score in test_predict: if score > 1.50: labs_pre.append(1) else: labs_pre.append(0) for score in score_test: if score > 1.50: labs_gt.append(1) else: labs_gt.append(0) for i in range(len(test_predict)): if labs_gt[i] == labs_pre[i]: cnt += 1 aver_acc += cnt / len(labs_pre) print('acc = ', aver_acc) def LR(data1, score1): data1_train, score_train, data1_test, score_test = train_test_split(data1, score1, test_size=0.2) aver_acc = 0 pre_list = [] score_train = [int(i) for i in score_train] clf = LogisticRegression(penalty="l2", C=1.0, random_state=None, solver="lbfgs", max_iter=3000,multi_class='ovr', verbose=0) clf.fit(data1_train, score_train) test_predict = clf.predict(data1_test) pre_list.append(test_predict) labs_pre = list() labs_gt = list() cnt = 0 for score in test_predict: if score > 1.50: labs_pre.append(1) else: labs_pre.append(0) for score in score_test: if score > 1.50: labs_gt.append(1) else: labs_gt.append(0) for i in range(len(test_predict)): if labs_gt[i] == labs_pre[i]: cnt += 1 aver_acc += cnt / len(labs_pre) print('acc = ', aver_acc) def Anylise(data, target): print("SVR_rbf") SVR_rbf(data, target) print("GP") GP(data, target) print("LR") LR(data, target) def PCA_com(data): pca = PCA(n_components=30) # 将维度降低到30 low = pca.fit_transform(data).shape[1] Data_pca = PCA(n_components=low) data_pca = Data_pca.fit_transform(data) return data_pca if __name__ == '__main__': # 加载数据集 with open('data1.txt', 'rb') as file_1: data = pickle.load(file_1) with open('score1.txt', 'rb') as file_1: scores = pickle.load(file_1) # PCA降维处理 data = PCA_com(data) scores = np.array(scores) # 进行分析 Anylise(data, scores) |
今日新闻 |
推荐新闻 |
专题文章 |
CopyRight 2018-2019 实验室设备网 版权所有 |