如何将带有dtype的列作为对象转换为Pandas Dataframe中的字符串

时间:2022-06-13 22:59:27

When I read a csv file to pandas dataframe, each column is cast to its own datatypes. I have a column that was converted to an object. I want to perform string operations for this column such as splitting the values and creating a list. But no such operation is possible because its dtype is object. Can anyone please let me know the way to convert all the items of a column to strings instead of objects?

当我将csv文件读取到pandas数据帧时,每个列都会转换为自己的数据类型。我有一个转换为对象的列。我想为此列执行字符串操作,例如拆分值和创建列表。但是没有这样的操作是可能的,因为它的dtype是对象。任何人都可以让我知道将列的所有项目转换为字符串而不是对象的方法吗?

I tried several ways but nothing worked. I used astype, str(), to_string etc.

我尝试了几种方法但没有任何效果。我使用了astype,str(),to_string等。

a=lambda x: str(x).split(',')
df['column'].apply(a)

or

df['column'].astype(str)

5 个解决方案

#1


15  

since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.

由于字符串数据类型具有可变长度,因此默认情况下将其存储为对象dtype。如果要将它们存储为字符串类型,可以执行以下操作。

df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,

or alternatively

或者

df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters

#2


9  

Did you try assigning it back to the column?

您是否尝试将其分配回列?

df['column'] = df['column'].astype('str') 

Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:

参考这个问题,pandas数据帧存储指向字符串的指针,因此它是'object'类型。根据文档,您可以尝试:

df['column_new'] = df['column'].str.split(',') 

#3


2  

Not answering the question directly, but it might help someone else.

没有直接回答问题,但它可能会帮助别人。

I have a column called Volume, having both - (invalid/NaN) and numbers formatted with ,

我有一个名为Volume的列,同时具有 - (无效/ NaN)和格式化的数字,

df['Volume'] = df['Volume'].astype('str')
df['Volume'] = df['Volume'].str.replace(',', '')
df['Volume'] = pd.to_numeric(df['Volume'], errors='coerce')

Casting to string is required for it to apply to str.replace

要将其应用于str.replace,需要转换为字符串

pandas.Series.str.replace
pandas.to_numeric

pandas.Series.str.replace pandas.to_numeric

#4


1  

You could try using df['column'].str. and then use any string function. Pandas documentation includes those like split

您可以尝试使用df ['column']。str。然后使用任何字符串函数。 Pandas文档包括像split这样的文档

#5


-3  

Please use df.to_string()

请使用df.to_string()

Reference link

参考链接

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_string.html

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_string.html

#1


15  

since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.

由于字符串数据类型具有可变长度,因此默认情况下将其存储为对象dtype。如果要将它们存储为字符串类型,可以执行以下操作。

df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,

or alternatively

或者

df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters

#2


9  

Did you try assigning it back to the column?

您是否尝试将其分配回列?

df['column'] = df['column'].astype('str') 

Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:

参考这个问题,pandas数据帧存储指向字符串的指针,因此它是'object'类型。根据文档,您可以尝试:

df['column_new'] = df['column'].str.split(',') 

#3


2  

Not answering the question directly, but it might help someone else.

没有直接回答问题,但它可能会帮助别人。

I have a column called Volume, having both - (invalid/NaN) and numbers formatted with ,

我有一个名为Volume的列,同时具有 - (无效/ NaN)和格式化的数字,

df['Volume'] = df['Volume'].astype('str')
df['Volume'] = df['Volume'].str.replace(',', '')
df['Volume'] = pd.to_numeric(df['Volume'], errors='coerce')

Casting to string is required for it to apply to str.replace

要将其应用于str.replace,需要转换为字符串

pandas.Series.str.replace
pandas.to_numeric

pandas.Series.str.replace pandas.to_numeric

#4


1  

You could try using df['column'].str. and then use any string function. Pandas documentation includes those like split

您可以尝试使用df ['column']。str。然后使用任何字符串函数。 Pandas文档包括像split这样的文档

#5


-3  

Please use df.to_string()

请使用df.to_string()

Reference link

参考链接

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_string.html

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.to_string.html