精品国产一区二区三区不卡免费_黄片在线免费观看一区二区三区_国产午夜福利精品集在线观看

經(jīng)過一段時(shí)間的整理，本期將分享我認(rèn)為比較常規(guī)的100個(gè)實(shí)用函數(shù)，這些函數(shù)大致可以分為六類，分別是統(tǒng)計(jì)匯總函數(shù)、數(shù)據(jù)清洗函數(shù)、數(shù)據(jù)篩選、繪圖與元素級(jí)運(yùn)算函數(shù)、時(shí)間序列函數(shù)和其他函數(shù)。

一、統(tǒng)計(jì)匯總函數(shù)數(shù)據(jù)分析過程中，必然要做一些數(shù)據(jù)的統(tǒng)計(jì)匯總工作，那么對(duì)于這一塊的數(shù)據(jù)運(yùn)算有哪些可用的函數(shù)可以幫助到我們呢？具體看如下幾張表。

import pandas as pd import numpy as np x = pd.Series（np.random.normal（2，3，1000）） y = 3*x + 10 + pd.Series（np.random.normal（1，2，1000）） # 計(jì)算x與y的相關(guān)系數(shù) print（x.corr（y）） # 計(jì)算y的偏度 print（y.skew（）） # 計(jì)算y的統(tǒng)計(jì)描述值 print（x.describe（）） z = pd.Series（［‘A’，‘B’，‘C’］）.sample（n = 1000， replace = True） # 重新修改z的行索引 z.index = range（1000） # 按照z分組，統(tǒng)計(jì)y的組內(nèi)平均值 y.groupby（by = z）.aggregate（np.mean）

# 統(tǒng)計(jì)z中個(gè)元素的頻次 print（z.value_counts（）） a = pd.Series（［1，5，10，15，25，30］） # 計(jì)算a中各元素的累計(jì)百分比 print（a.cumsum（） / a.cumsum（）［a.size - 1］）

二、數(shù)據(jù)清洗函數(shù)同樣，數(shù)據(jù)清洗工作也是必不可少的工作，在如下表格中羅列了常有的數(shù)據(jù)清洗的函數(shù)。

x = pd.Series（［10，13，np.nan，17，28，19，33，np.nan，27］） #檢驗(yàn)序列中是否存在缺失值 print（x.hasnans） # 將缺失值填充為平均值 print（x.fillna（value = x.mean（））） # 前向填充缺失值 print（x.ffill（））

income = pd.Series（［‘12500元’，‘8000元’，‘8500元’，‘15000元’，‘9000元’］） # 將收入轉(zhuǎn)換為整型 print（income.str［：-1］.astype（int）） gender = pd.Series（［‘男’，‘女’，‘女’，‘女’，‘男’，‘女’］） # 性別因子化處理 print（gender.factorize（）） house = pd.Series（［‘大寧金茂府 | 3室2廳 | 158.32平米 | 南 | 精裝’， ‘昌里花園 | 2室2廳 | 104.73平米 | 南 | 精裝’， ‘紡大小區(qū) | 3室1廳 | 68.38平米 | 南 | 簡(jiǎn)裝’］） # 取出二手房的面積，并轉(zhuǎn)換為浮點(diǎn)型 house.str.split（‘|’）.str［2］.str.strip（）.str［：-2］.astype（float）

三、數(shù)據(jù)篩選數(shù)據(jù)分析中如需對(duì)變量中的數(shù)值做子集篩選時(shí)，可以巧妙的使用下表中的幾個(gè)函數(shù)，其中部分函數(shù)既可以使用在序列身上，也基本可以使用在數(shù)據(jù)框?qū)ο笾小?/p>

np.random.seed（1234） x = pd.Series（np.random.randint（10，20，10）） # 篩選出16以上的元素 print（x.loc［x 》 16］） print（x.compress（x 》 16）） # 篩選出13~16之間的元素 print（x［x.between（13，16）］） # 取出最大的三個(gè)元素 print（x.nlargest（3）） y = pd.Series（［‘ID:1 name：張三 age:24 income:13500’， ‘ID:2 name：李四 age:27 income:25000’， ‘ID:3 name：王二 age:21 income:8000’］） # 取出年齡，并轉(zhuǎn)換為整數(shù) print（y.str.findall（‘a(chǎn)ge：（d+）’）.str［0］.astype（int））

四、繪圖與元素級(jí)函數(shù)

np.random.seed（123） import matplotlib.pyplot as plt x = pd.Series（np.random.normal（10，3，1000）） # 繪制x直方圖 x.hist（） # 顯示圖形 plt.show（） # 繪制x的箱線圖 x.plot（kind=‘box’） plt.show（） installs = pd.Series（［‘1280萬’，‘6.7億’，‘2488萬’，‘1892萬’，‘9877’，‘9877萬’，‘1.2億’］） # 將安裝量統(tǒng)一更改為“萬”的單位 def transform（x）： if x.find（‘億’）！= -1： res = float（x［：-1］）*10000 elif x.find（‘萬’）！= -1： res = float（x［：-1］） else： res = float（x）/10000 return res installs.apply（transform）

五、時(shí)間序列函數(shù)

六、其他函數(shù)

import numpy as np import pandas as pd np.random.seed（112） x = pd.Series（np.random.randint（8，18，6）） print（x） # 對(duì)x中的元素做一階差分 print（x.diff（）） # 對(duì)x中的元素做降序處理 print（x.sort_values（ascending = False）） y = pd.Series（np.random.randint（8，16，100）） # 將y中的元素做排重處理，并轉(zhuǎn)換為列表對(duì)象 y.unique（）.tolist（）

原文標(biāo)題：100 個(gè) pandas 數(shù)據(jù)分析函數(shù)總結(jié)

文章出處：【微信公眾號(hào)：數(shù)據(jù)分析與開發(fā)】歡迎添加關(guān)注！文章轉(zhuǎn)載請(qǐng)注明出處。

責(zé)任編輯：haq

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請(qǐng)聯(lián)系本站處理。舉報(bào)投訴