我有一个字符串列,有时在字符串中有回车符:
import pandas as pd
from io import StringIO
datastring = StringIO("""\
country metric 2011 2012
USA GDP 7 4
USA Pop. 2 3
GB GDP 8 7
""")
df = pd.read_table(datastring,sep='\s\s+')
df.metric = df.metric + '\r' # append carriage return
print(df)
country metric 2011 2012
0 USA GDP\r 7 4
1 USA Pop.\r 2 3
2 GB GDP\r 8 7
在写入和读取csv时,数据框被破坏:
df.to_csv('data.csv',index=None)
print(pd.read_csv('data.csv'))
country metric 2011 2012
0 USA GDP NaN NaN
1 NaN 7 4 NaN
2 USA Pop. NaN NaN
3 NaN 2 3 NaN
4 GB GDP NaN NaN
5 NaN 8 7 NaN
题
解决这个问题的最佳方法是什么?一个显而易见的方法是首先清理数据
df.metric = df.metric.str.replace('\r','')
解决方法
指定lineterminator:
print(pd.read_csv('data.csv',lineterminator='\n'))
country metric 2011 2012
0 USA GDP\r 7 4
1 USA Pop.\r 2 3
2 GB GDP\r 8 7