Python更改数据类型

#Python更改数据类型| 来源: 网络整理| 查看: 265

文章目录明确指定数据的类型通过dtypes属性进行查看创建Pandas对象指定数据类型转换数据类型通过astype()方法强制转换数据的类型通过to_numeric()函数转换数据类型

明确指定数据的类型通过dtypes属性进行查看 import pandas as pd df = pd.DataFrame({'A': ['1', '2', '4'], 'B': ['9', '-80', '5.3'], 'C': ['x', '5.9', '0']}) print("df.dtypes:\n", df.dtypes) print("df:\n", df)

输出结果：

df.dtypes: A object B object C object dtype: object df: A B C 0 1 9 x 1 2 -80 5.9 2 4 5.3 0 创建Pandas对象指定数据类型 data = pd.DataFrame({'A': ['1', '2', '4'], 'B': ['9', '80', '5']}, dtype='int') print("data:\n", data) print("data.dtypes:\n", data.dtypes)

输出结果：

data: A B 0 1 9 1 2 80 2 4 5 data.dtypes: A int32 B int32 dtype: object 转换数据类型通过astype()方法强制转换数据的类型

astype(dypte, copy=True, errors = ‘raise’, **kwargs)

上述方法中部分参数表示的含义如下：

dtype：表示数据类型

copy：是否建立副本，默认为True

errors：错误采取的处理方式，可以取值为raise或ignore，默认为raise。其中raise表示允许引发异常，ignore表示抑制异常。

运用astype()方法将DataFrame对象df中B列数据的类型转换为int类型：

print("df['B']:\n", df['B']) print("df['B'].astype:\n", df['B'].astype(dtype='float')) df['B']: 0 9 1 -80 2 5.3 Name: B, dtype: object df['B'].astype: 0 9.0 1 -80.0 2 5.3 Name: B, dtype: float64

之所以没有将所有列进行类型转换是因为C列中有非数字类型的字符，无法将其转换为int类型，若强制转换会出现ValueError异常。（当参数errors取值ignore时可以抑制异常，但抑制异常后输出结果仍是未转换类型之前的对象——也就是并未进行数据类型转换的操作，只是不会报错罢了）

print("df['C']:\n", df['C']) print("df['C'].astype(errors='ignore'):\n", df['C'].astype(dtype='float', errors='ignore'))

输出结果：

df['C']: 0 x 1 5.9 2 0 Name: C, dtype: object df['C'].astype(errors='ignore'): 0 x 1 5.9 2 0 Name: C, dtype: object 通过to_numeric()函数转换数据类型

to_numeric()函数不能直接操作DataFrame对象

pandas.to_numeric(arg, errors=‘raise’, downcast=None)

上述函数中常用参数表示的含义如下：

arg：表示要转换的数据，可以是list、tuple、Series

errors：错误采用的处理方式可以取值除raise、ignore外，还可以取值coerce，默认为raise。其中raise表示允许引发异常，ignore表示抑制异常。

to_numeric()函数较之astype()方法的优势在于解决了后者的局限性：只要待转换的数据中存在数字以外的字符，在使用后者进行类型转换时就会出现错误，而to_numeric()函数之所以可以解决这个问题，就源于其errors参数可以取值coerce——当出现非数字字符时，会将其替换为缺失值之后进行数据类型转换。

se = pd.Series(df['A']) se1 = pd.Series(df['B']) se2 = pd.Series(df['C']) print("df['A']:\n", df['A']) print("to_numeric(df['A']):\n", pd.to_numeric(se)) print("df['B']:\n", df['B']) print("to_numeric(df['B']):\n", pd.to_numeric(se1)) print("df['C']:\n", df['C']) print("to_numeric(df['C'], errors='ignore'):\n", pd.to_numeric(se2, errors='ignore')) print("to_numeric(df['C'], errors='coerce'):\n", pd.to_numeric(se2, errors='coerce'))

输出结果：

df['A']: 0 1 1 2 2 4 Name: A, dtype: object to_numeric(df['A']): 0 1 1 2 2 4 Name: A, dtype: int64 df['B']: 0 9 1 -80 2 5.3 Name: B, dtype: object to_numeric(df['B']): 0 9.0 1 -80.0 2 5.3 Name: B, dtype: float64 df['C']: 0 x 1 5.9 2 0 Name: C, dtype: object to_numeric(df['C'], errors='ignore'): 0 x 1 5.9 2 0 Name: C, dtype: object to_numeric(df['C'], errors='coerce'): 0 NaN 1 5.9 2 0.0 Name: C, dtype: float64

【本文地址】

公司简介

联系我们