I数据类型
astype方法是通用函数,可以把DataFrame中的任何列都转换为其他dtype
1.转换为字符串对象
把列值转换为字符串对象可以使用astype方法,该方法有一个dtype参数,用于指定转换目标数据的类型。sns中原数据集tips数据类型:
tips = sns.load_dataset("tips") print(tips.dtypes)
data:image/s3,"s3://crabby-images/c2381/c23819037edce3811c4a71c2d663b14d9ac724b8" alt=""
tips['sex_str'] = tips['sex'].astype(str)
print(tips.dtypes)
data:image/s3,"s3://crabby-images/58596/585967574bd79b2f6f1f4dae07534270090cac4e" alt=""
2.转换为数值类型
向astype方法提供任何内置类型或者numpy类型来转换列的dtype
tips['total_bill'] = tips['total_bill'].astype(str)
print(tips.dtypes)
tips['total_bill'] = tips['total_bill'].astype(float)
print(tips.dtypes)
data:image/s3,"s3://crabby-images/3200c/3200c54258eaa030c285da304026ab8e49779029" alt=""
data:image/s3,"s3://crabby-images/08688/086885fac1761bd1838f1c5ec3691ec4a007823e" alt=""
新加一些缺失值,pandas的astype方法无法将含有缺失值列的类型进行转换
#获取tips子集
tips_sub_miss = tips.head(10)
tips_sub_miss.loc[[1,3,5,7], 'total_bill'] = 'misssing'
print(tips_sub_miss)
print(tips_sub_miss.dtypes)
tips_sub_miss['total_bill'].astype(float)
data:image/s3,"s3://crabby-images/dacd7/dacd744c255ffa00aaad581611574b67036b358e" alt=""
data:image/s3,"s3://crabby-images/0b5cc/0b5ccf016380bcd6d96113c87c60e4103cf3a81b" alt=""
2.1 to_numeric方法
to_numeric函数有一个参数 errors,它决定了当该函数遇到无法转换为数值的值时该如何处理。默认情况下,该参数值为raise,即如果to_numeric遇到无法转换为数值的值,它就会“引发”一个错误。
to numeric函数的说明文档指出,errors参数有如下3种可能的取值。
(1) raise:这是 errors参数的默认值。当to_numeric函数遇到无法转换为数值的值时,它会引发一个错误。
(2) coerce: 当to_numeric 函数遇到无法转换为数值的值时,会返回NaN。
(3) ignore: 当to_numeric函数遇到无法转换为数值的值时会放弃转换,直接返回整列(即什么都不做)。
(1)errors参数为raise时
pd.to_numeric(tips_sub_miss['total_bill'])
data:image/s3,"s3://crabby-images/004c8/004c88da3839359ec9ecb4efe6039ccd238e5971" alt=""
(2)errors参数为coerce
tips_sub_miss['total_bill'] = pd.to_numeric(tips_sub_miss['total_bill'], errors='coerce')
print(tips_sub_miss)
print(tips_sub_miss.dtypes)
data:image/s3,"s3://crabby-images/d40f0/d40f0de175a62f22963b8a67a5a060642c3d5154" alt=""
data:image/s3,"s3://crabby-images/c339f/c339f34abeb3f9cd55aa73d5d808cb7a72c5e770" alt=""
(3)参数为ignore
tips_sub_miss['total_bill'] = pd.to_numeric(tips_sub_miss['total_bill'], errors='ignore')
print(tips_sub_miss)
print(tips_sub_miss.dtypes)
data:image/s3,"s3://crabby-images/53a5a/53a5a6e15d827a64cba684a7562129faa68eb8e6" alt=""
data:image/s3,"s3://crabby-images/af828/af828b47b9ed4abbfbfa75d6d6875d72565a87dc" alt=""
2.2 to_numeric向下转型
to_numeric函数还有一个 downcast参数,它允许把列(或向量)转换为数值向量之后,把数值类型更改(即向下转型)为最小的数值类型。默认情况下,downcast 的值为None,其他可能的值有“integer”“signed"“unsigned”和“float"。
downcast参数设置为float之后,total_bill 的数据类型由float64变为float32
tips_sub_miss['total_bill'] = pd.to_numeric(tips_sub_miss['total_bill'],
errors='coerce', downcast='float')
print(tips_sub_miss)
print(tips_sub_miss.dtypes)
data:image/s3,"s3://crabby-images/81fba/81fbaceeed620752326ee156cf6a6dee75031e76" alt=""
3.分类数据
转换为category类型
tips['sex'] = tips['sex'].astype('str')
print(tips.info())
data:image/s3,"s3://crabby-images/e0d30/e0d304839ab046cb23931ce38ca44508c14f8bad" alt=""
tips['sex'] = tips['sex'].astype('category')
print(tips.info())
data:image/s3,"s3://crabby-images/5878d/5878d0852154327e457dc94c69c999c3f8f50fe5" alt=""