在 Pandas DataFrame 列中将单列拆分为多列 您所在的位置:网站首页 python将列表分开 在 Pandas DataFrame 列中将单列拆分为多列

在 Pandas DataFrame 列中将单列拆分为多列

2024-07-09 20:45| 来源: 网络整理| 查看: 265

Pandas 有一种众所周知的方法,可以通过列表的破折号、空格和返回列(Series)来拆分字符串列或文本列;如果我们谈论 pandas,术语 Series 被称为 Dataframe 列。

我们可以使用 pandas Series.str.split() 函数将字符串拆分为围绕给定分隔符或定界符的多列。它类似于 Python 字符串 split() 方法,但适用于整个 Dataframe 列。我们有最简单的方法来分隔下面的列。

此方法将 Series 字符串与初始索引分开。

Series.str.split(pat=None, n=-1, expand=False)

让我们尝试了解此方法的工作原理

# import Pandas as pd import pandas as pd # innitilize Dataframe df = pd.DataFrame( { "Email": [ "[email protected]", "[email protected]", "[email protected]", ], "Number": ["+44-3844556210", "+44-2245551219", "+44-1049956215"], "Location": ["Alameda,California", "Sanford,Florida", "Columbus,Georgia"], } ) print("Dataframe series:\n", df)

我们创建了一个 Dataframe df,包含三列,Email、Number 和 Location。请注意,电子邮件列中的字符串具有特定的模式。但是,如果你仔细观察,可以将此列拆分为两列。我们将很好地解决所需的问题。

输出:

Dataframe series : Email Number Location 0 [email protected] +44-3844556210 Alameda,California 1 [email protected] +44-2245551219 Sanford,Florida 2 [email protected] +44-1049956215 Columbus,Georgia

我们将使用 Series.str.split() 函数来分隔 Number 列并在 split() 方法中传递 -。确保将 True 传递给 expand 关键字。

示例 1:

print( "\n\nSplit 'Number' column by '-' into two individual columns :\n", df.Number.str.split(pat="-", expand=True), )

这个例子将用 - 分割系列(数字)的每个值。

输出:

Split 'Number' column into two individual columns : 0 1 0 +44 3844556210 1 +44 2245551219 2 +44 1049956215

如果我们只使用扩展参数 Series.str.split(expand=True),这将允许拆分空格,但不能用 - 和 , 或字符串中存在的任何正则表达式进行分隔,你必须通过 pat 参数。

让我们重命名这些拆分列。

df[["Dialling Code", "Cell-Number"]] = df.Number.str.split("-", expand=True) print(df)

我们创建了两个新系列 Dialling code 和 Cell-Number 并使用 Number 系列分配值。

输出:

Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number 0 3844556210 1 2245551219 2 1049956215

示例 2:

在这个例子中,我们将用 , 分割 Location 系列。

df[["City", "State"]] = df.Location.str.split(",", expand=True) print(df)

拆分 Location 系列并将其值存储在单独的系列 City 和 State 中。

输出:

Email Number Location City \ 0 [email protected] +44-3844556210 Alameda,California Alameda 1 [email protected] +44-2245551219 Sanford,Florida Sanford 2 [email protected] +44-1049956215 Columbus,Georgia Columbus State 0 California 1 Florida 2 Georgia

让我们看看最后一个例子。我们将在 Email 系列中分隔全名。

full_name = df.Email.str.split(pat="@", expand=True) print(full_name)

输出:

0 1 0 Alex.jhon gmail.com 1 Hamza.Azeez gmail.com 2 Harry.barton hotmail.com

现在我们用 . 分隔名字和姓氏。

df[["First Name", "Last Name"]] = full_name[0].str.split(".", expand=True) print(df)

输出:

Email Number Location First Name \ 0 [email protected] +44-3844556210 Alameda,California Alex 1 [email protected] +44-2245551219 Sanford,Florida Hamza 2 [email protected] +44-1049956215 Columbus,Georgia Harry Last Name 0 jhon 1 Azeez 2 barton

如果在 .split() 方法中传递了 expand=True,n=-1 参数将不起作用。

print(df["Email"].str.split("@", n=-1, expand=True))

输出:

0 1 0 George Washington 1 Hamza Azeez 2 Harry Walker

整个示例代码如下。

# import Pandas as pd import pandas as pd # create a new Dataframe df = pd.DataFrame( { "Email": [ "[email protected]", "[email protected]", "[email protected]", ], "Number": ["+44-3844556210", "+44-2245551219", "+44-1049956215"], "Location": ["Alameda,California", "Sanford,Florida", "Columbus,Georgia"], } ) print("Dataframe series :\n", df) print( "\n\nSplit 'Number' column by '-' into two individual columns :\n", df.Number.str.split(pat="-", expand=True), ) df[["Dialling Code", "Cell-Number"]] = df.Number.str.split("-", expand=True) print(df) df[["City", "State"]] = df.Location.str.split(",", expand=True) print(df) full_name = df.Email.str.split(pat="@", expand=True) print(full_name) df[["First Name", "Last Name"]] = full_name[0].str.split(".", expand=True) print(df)

输出:

Dataframe series : Email Number Location 0 [email protected] +44-3844556210 Alameda,California 1 [email protected] +44-2245551219 Sanford,Florida 2 [email protected] +44-1049956215 Columbus,Georgia Split 'Number' column by '-' into two individual columns : 0 1 0 +44 3844556210 1 +44 2245551219 2 +44 1049956215 Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number 0 3844556210 1 2245551219 2 1049956215 Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number City State 0 3844556210 Alameda California 1 2245551219 Sanford Florida 2 1049956215 Columbus Georgia 0 1 0 Alex.jhon gmail.com 1 Hamza.Azeez gmail.com 2 Harry.barton hotmail.com Email Number Location Dialling Code \ 0 [email protected] +44-3844556210 Alameda,California +44 1 [email protected] +44-2245551219 Sanford,Florida +44 2 [email protected] +44-1049956215 Columbus,Georgia +44 Cell-Number City State First Name Last Name 0 3844556210 Alameda California Alex jhon 1 2245551219 Sanford Florida Hamza Azeez 2 1049956215 Columbus Georgia Harry barton


【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有