Python数据分析

2023-02-17 02:35| 来源: 网络整理| 查看: 265

又是好久不见，博主最近陆陆续续地准备雅思考试。。终于在昨天告一段落，但估计只是暂时的。。

不管成绩如何，也该重新回到专业的学习当中了。最近恰逢一个契机——加入了一支数学建模队伍，备战几天后的数学建模竞赛，作为编程人员，理应回顾一些数学建模常见的数据分析内容。这篇文章主要讲述利用Py的三方库数据分析的几种常见场景，所以是以一个个示例抛砖引玉。

柱状图、折线图

e.g. 小明作为超市老板，进行了一年一度的商品售卖情况总结。他购买每个商品的数量和对应商品受喜爱的数量最终如下表所示，请你将商品受喜爱程度作为指标做一张图（图表类型不限），并以颜色区分程度的多少

商品售卖数量喜爱该产品的人数商品A10050商品B20070商品C150100商品D2310商品E3010商品F5034商品G15656商品H3218商品I6735商品J8945超市商品售卖情况

本人思路：通过运算得到不喜爱该产品的人数，将喜爱的人数/不喜爱的人数比例作为划分标准，制作散点图。

# Python Program illustrating # pyplot.colorbar() method import numpy as np import matplotlib.pyplot as plt # Dataset # List of total number of items purchased purchaseCount = [100, 200, 150, 23, 30, 50, 156, 32, 67, 89] # List of total likes of 10 products likes = [50, 70, 100, 10, 10, 34, 56, 18, 35, 45] # List of Like/Dislike ratio of 10 products ratio = [1, 0.53, 2, 0.76, 0.5, 2.125, 0.56, 1.28, 1.09, 1.02] # scatterplot plt.scatter(x=purchaseCount, y=likes, c=ratio, cmap="summer") plt.colorbar(label="Like/Dislike Ratio", orientation="horizontal") plt.show()

e.g. 现有多个国家的收入与腐败成本的数据如下，统计这两项数据，并用颜色区分腐败成本占总收入的比例。

国家收入与腐败成本数据（由于篇幅问题只展示了部分）

思路基本与上题类似，这里需要简单的用数组来存储每一列的数据

from matplotlib import pyplot as plt import csv # Used to display the negative sign normally plt.rcParams['axes.unicode_minus']=False # Define two empty lists to store x, y axis data points x=[] y=[] with open("../corruption (1).csv", 'r') as csvfile: plots = csv.reader(csvfile, delimiter=',') for row in plots: x.append(int(row[1])) # The data read from csv is str type, we need to convert to int type # print("x:",x) y.append(int(row[2])) # print("y:",y) ratio = [a / b for a, b in zip(x, y)] # draw a scatter plot plt.scatter(x,y,c=ratio) plt.colorbar(label="income/corruption Ratio", orientation="horizontal") plt.xlabel('income') plt.ylabel('corruption') plt.title('income and corruption') plt.show()

函数线性回归线

e.g. 随机给定相同数量的x与对应的y值，请拟合出对应的函数曲线

import numpy as np import matplotlib.pyplot as plt # Take a sequence of natural numbers as the coefficients of a polynomial func = np.poly1d(np.array([1,5, -4])) # x 的横坐标 x = np.random.randint(0,200,50) y = np.random.randint(0,300,50) # 得到y的对应值 x1=sorted(x) print(x1) y1=sorted(y) print(y) z1=np.polyfit(x1,y1,3) p1=np.poly1d(z1) print(z1) print(p1) #绘图 plt.scatter(x1,y1) plt.plot(x1,p1(x1)) plt.xlabel('x') plt.ylabel('y(x)') # 显示函数图像 plt.show()

类似的，可以利用拟合出的多项式来预测x取某个值的时候y对应的值

import random import numpy as np import matplotlib.pyplot as plt # Randomly assign the coordinates x, y random_list1 = list((range(1, 300))) random_list2 = list((range(1, 300))) x = random.sample(random_list1, 7) y = random.sample(random_list2, 7) model = np.poly1d(np.polyfit(x, y, 2)) plt.scatter(x, y) # draw polynomial regression line myline = np.linspace(1, 300, 150) plt.plot(myline, model(myline)) plt.show() # predict a future value when x=255 predict = str(model(255)) print("when x is 255, value is expected to be " + predict)

抓取信息整理分析 import requests import numpy as np from bs4 import BeautifulSoup import matplotlib.pyplot as plt plt.rcParams['font.sans-serif'] = ['Arial Unicode MS'] # Crawl the web content page = requests.get('https://www.imdb.com/chart/top/?ref_=nv_mv_250') soup = BeautifulSoup(page.content, 'html.parser') # Crawl movie name links = soup.select("table tbody tr td.titleColumn a") # Crawl movie ratings links1 = soup.select("table tbody tr td.ratingColumn strong") fig = plt.figure(figsize=(25, 10)) name = [] # For better viewing, this array will store the reverse order of the name array names = [] score = [] # For better viewing, this array will store the reverse order of the score array scores = [] colors = np.array([10, 20, 30, 40, 50, 60, 70, 80, 100]) plt.xticks(fontsize=12) plt.yticks(fontsize=15) firstscore = links1[::28] for anchor in firstscore: score.append(anchor.text) # scores=score[::-1] for i in reversed(score): scores.append(i) first = links[::28] for anchor in first: name.append(anchor.text) for i in reversed(name): names.append(i) print(names) print(scores) plt.scatter(names, scores, c=colors, cmap='viridis') plt.colorbar() plt.show()

e.g. 爬取某评分网站的电影评分数据，并以散点图的形式展示部分数据

以上这些例子，其实都是对Py三方库进行基础的应用，想要熟练，更深一步的应用它们，还是需要再去系统的学习。

【本文地址】

公司简介

联系我们