国产日韩精品一区二区在线观看_欧美激情综合亚洲一二区_国产成人精品久久久久免费_99久久精品国产一区二区三区_国产午夜精品久久久久九九_久久精品国产99久久99久久久_久久免费国产精品一区二区

作者：Sugandha Lahoti，翻譯：李潔，轉(zhuǎn)自：數(shù)據(jù)派（ID：datapi）

注：本文節(jié)選自Ankit Dixit所著的《集成機(jī)器學(xué)習(xí)》(Ensemble Machine Learning)一書。這本書組合強(qiáng)大的機(jī)器學(xué)習(xí)算法來(lái)建立優(yōu)化模型，可以作為初學(xué)者的指南。

在本文中，我們將研究從數(shù)據(jù)集中選擇特征的不同方法;同時(shí)通過(guò)使用Python中Scikit-learn (sklearn)庫(kù)實(shí)現(xiàn)討論了特征選擇算法的類型:

單變量選擇
遞歸特征消除(RFE)
主成分分析(PCA)
選擇重要特征(特征重要度)

我們簡(jiǎn)要介紹了前三種算法及其實(shí)現(xiàn)。然后我們將詳細(xì)討論在數(shù)據(jù)科學(xué)社區(qū)中廣泛使用的選擇重要特征(特性重要度)部分的內(nèi)容。

單變量選擇

統(tǒng)計(jì)測(cè)試可用于選擇那些與輸出變量關(guān)系最強(qiáng)的特征。

scikit-learn庫(kù)提供了SelectKBest類，它可以與一組不同的統(tǒng)計(jì)測(cè)試一起使用，以選擇特定數(shù)量的特征。

下面的例子使用chi2非負(fù)性特征的統(tǒng)計(jì)測(cè)試，從皮馬印第安人糖尿病發(fā)病數(shù)據(jù)集中選擇了四個(gè)最好的特征:
#Feature Extraction with Univariate Statistical Tests (Chi-squared for classification)

#Import the required packages

#Import pandas to read csv import pandas

#Import numpy for array related operations import numpy

#Import sklearn's feature selection algorithm

from sklearn.feature_selection import SelectKBest

#Import chi2 for performing chi square test from sklearn.feature_selection import chi2

#URL for loading the dataset

url ="https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians diabetes/pima-indians-diabetes.data"

#Define the attribute names

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

#Create pandas data frame by loading the data from URL

dataframe = pandas.read_csv(url, names=names)

#Create array from data values

array = dataframe.values

#Split the data into input and target

X = array[:,0:8]

Y = array[:,8]

#We will select the features using chi square

test = SelectKBest(score_func=chi2, k=4)

#Fit the function for ranking the features by score

fit = test.fit(X, Y)

#Summarize scores numpy.set_printoptions(precision=3) print(fit.scores_)

#Apply the transformation on to dataset

features = fit.transform(X)

#Summarize selected features print(features[0:5,:])

你可以看到每個(gè)參數(shù)的得分，以及所選擇的四個(gè)參數(shù)(得分最高的):plas、test、mass和age。

每個(gè)特征的分?jǐn)?shù)為：
[111.52 1411.887 17.605 53.108 2175.565 127.669 5.393

181.304]

被選出的特征是：
[[148. 0. 33.6 50. ]

[85. 0. 26.6 31. ]

[183. 0. 23.3 32. ]

[89. 94. 28.1 21. ]

[137. 168. 43.1 33. ]]

遞歸特征消除(RFE)

RFE的工作方式是遞歸地刪除參數(shù)并在保留的參數(shù)上構(gòu)建模型。它使用模型精度來(lái)判斷哪些屬性(以及屬性的組合)對(duì)預(yù)測(cè)目標(biāo)參數(shù)貢獻(xiàn)最大。你可以在scikit-learn的文檔中了解更多關(guān)于RFE類的信息。

下面的示例使用RFE和logistic回歸算法來(lái)選出前三個(gè)特征。算法的選擇并不重要，只需要熟練并且一致:
#Import the required packages

#Import pandas to read csv import pandas

#Import numpy for array related operations import numpy

#Import sklearn's feature selection algorithm from sklearn.feature_selection import RFE

#Import LogisticRegression for performing chi square test from sklearn.linear_model import LogisticRegression

#URL for loading the dataset

url =

"https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-dia betes/pima-indians-diabetes.data"

#Define the attribute names

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

#Create pandas data frame by loading the data from URL

dataframe = pandas.read_csv(url, names=names)

#Create array from data values

array = dataframe.values

#Split the data into input and target

X = array[:,0:8]

Y = array[:,8]

#Feature extraction

model = LogisticRegression() rfe = RFE(model, 3)

fit = rfe.fit(X, Y)

print("Num Features: %d"% fit.n_features_) print("Selected Features: %s"% fit.support_) print("Feature Ranking: %s"% fit.ranking_)

執(zhí)行完上述代碼后，我們可以得到:
Num Features: 3

Selected Features: [ True False False False False True True False]

Feature Ranking: [1 2 3 5 6 1 1 4]

你可以看到RFE選擇了前三個(gè)特性，即preg、mass和pedi。這些在support_數(shù)組中被標(biāo)記為True，在ranking_數(shù)組中被標(biāo)記為首選（標(biāo)記為1）。

主成分分析

PCA使用線性代數(shù)將數(shù)據(jù)集轉(zhuǎn)換為壓縮格式。通常，它被認(rèn)為是一種數(shù)據(jù)約簡(jiǎn)技術(shù)。PCA的一個(gè)屬性是，你可以選擇轉(zhuǎn)換結(jié)果中的維數(shù)或主成分的數(shù)量。

在接下來(lái)的例子中，我們使用PCA并選擇了三個(gè)主成分:
#Import the required packages

#Import pandas to read csv import pandas

#Import numpy for array related operations import numpy

#Import sklearn's PCA algorithm

from sklearn.decomposition import PCA

#URL for loading the dataset

url =

"https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians diabetes/pima-indians-diabetes.data"

#Define the attribute names

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

dataframe = pandas.read_csv(url, names=names)

#Create array from data values

array = dataframe.values

#Split the data into input and target

X = array[:,0:8]

Y = array[:,8]

#Feature extraction

pca = PCA(n_components=3) fit = pca.fit(X)

#Summarize components

print("Explained Variance: %s") % fit.explained_variance_ratio_

print(fit.components_)

你可以看到，轉(zhuǎn)換后的數(shù)據(jù)集(三個(gè)主成分)與源數(shù)據(jù)幾乎沒(méi)有相似之處:

Explained Variance: [ 0.88854663 0.06159078 0.02579012]

[[ -2.02176587e-03 9.78115765e-02 1.60930503e-02 6.07566861e-02

9.93110844e-01 1.40108085e-02 5.37167919e-04 -3.56474430e-03]

[ -2.26488861e-02 -9.72210040e-01 -1.41909330e-01 5.78614699e-02 9.46266913e-02 -4.69729766e-02 -8.16804621e-04 -1.40168181e-01

[ -2.24649003e-02 1.43428710e-01 -9.22467192e-01 -3.07013055e-01 2.09773019e-02 -1.32444542e-01 -6.39983017e-04 -1.25454310e-01]]

選擇重要特征(特性重要度)

特征重要度是一種利用訓(xùn)練好的有監(jiān)督分類器來(lái)選擇特征的技術(shù)。當(dāng)我們訓(xùn)練分類器(如決策樹(shù))時(shí)，我們計(jì)算每個(gè)參數(shù)以創(chuàng)建分割;我們可以使用這個(gè)度量作為特征選擇器。讓我們來(lái)詳細(xì)了解一下。

隨機(jī)森林由于其相對(duì)較好的準(zhǔn)確性、魯棒性和易用性而成為最受歡迎的機(jī)器學(xué)習(xí)方法之一。它們還提供了兩種簡(jiǎn)單易行的特征選擇方法——均值降低雜質(zhì)和均值降低準(zhǔn)確度。

隨機(jī)森林由許多決策樹(shù)組成。決策樹(shù)中的每個(gè)節(jié)點(diǎn)都是一個(gè)基于單個(gè)特征的條件，其設(shè)計(jì)目的是將數(shù)據(jù)集分割成兩個(gè)，以便相似的響應(yīng)值最終出現(xiàn)在相同的集合中。選擇(局部)最優(yōu)條件的度量叫做雜質(zhì)。對(duì)于分類問(wèn)題，它通常是基尼雜質(zhì)或信息增益/熵，而對(duì)于回歸樹(shù)，它是方差。因此，當(dāng)訓(xùn)練一棵樹(shù)時(shí)，可以通過(guò)每個(gè)特征減少的樹(shù)中加權(quán)雜質(zhì)的多少來(lái)計(jì)算。對(duì)于森林，可以對(duì)每個(gè)特征的雜質(zhì)減少量進(jìn)行平均，并根據(jù)該方法對(duì)特征進(jìn)行排序。

讓我們看一下如何使用隨機(jī)森林分類器來(lái)進(jìn)行特征選擇，并評(píng)估特征選擇前后分類器的準(zhǔn)確性。我們將使用Otto數(shù)據(jù)集。該數(shù)據(jù)集可從kaggle免費(fèi)獲得（你需要注冊(cè)kaggle才能下載該數(shù)據(jù)集)。你可以從https://www.kaggle.com/c/otto-group-product- classifics-challenge/data下載訓(xùn)練集train.csv.zip，然后將解壓縮的train.csv文件放在你的工作目錄中。

這個(gè)數(shù)據(jù)集描述了超過(guò)61,000個(gè)產(chǎn)品的93個(gè)模糊細(xì)節(jié)，這些產(chǎn)品被分成10個(gè)產(chǎn)品類別(例如，時(shí)尚類、電子產(chǎn)品類等)。輸入?yún)?shù)是某種類型的不同事件的計(jì)數(shù)。

訓(xùn)練目標(biāo)是對(duì)新產(chǎn)品作為10個(gè)類別中每一個(gè)類別的概率數(shù)組做出預(yù)測(cè)，并使用多級(jí)對(duì)數(shù)損失（也稱為交叉熵）對(duì)模型進(jìn)行評(píng)估。

我們將從導(dǎo)入所有庫(kù)開(kāi)始:

#Import the supporting libraries

#Import pandas to load the dataset from csv file

from pandas import read_csv

#Import numpy for array based operations and calculations

import numpy as np

#Import Random Forest classifier class from sklearn

from sklearn.ensemble import RandomForestClassifier

#Import feature selector class select model of sklearn

from sklearn.feature_selection

import SelectF romModel

np.random.seed(1)

定義一個(gè)方法用于將我們的數(shù)據(jù)集分為訓(xùn)練數(shù)據(jù)和測(cè)試數(shù)據(jù)；我們將在訓(xùn)練數(shù)據(jù)部分對(duì)數(shù)據(jù)集進(jìn)行訓(xùn)練，測(cè)試數(shù)據(jù)部分將用于訓(xùn)練模型的評(píng)估:

#Function to create Train and Test set from the original dataset def getTrainTestData(dataset,split):

np.random.seed(0) training = [] testing = []

np.random.shuffle(dataset) shape = np.shape(dataset)

trainlength = np.uint16(np.floor(split*shape[0]))

for i in range(trainlength): training.append(dataset[i])

for i in range(trainlength,shape[0]): testing.append(dataset[i])

training = np.array(training) testing = np.array(testing)

return training,testing

還需要添加一個(gè)函數(shù)來(lái)評(píng)估模型的準(zhǔn)確性；以預(yù)測(cè)輸出和實(shí)際輸出為輸入，計(jì)算準(zhǔn)確率百分比：

#Function to evaluate model performance

def getAccuracy(pre,ytest): count = 0

for i in range(len(ytest)):

if ytest[i]==pre[i]: count+=1

acc = float(count)/len(ytest)

return acc

現(xiàn)在要導(dǎo)入數(shù)據(jù)集。我們將導(dǎo)入train.csv文件；該文件包含61,000多個(gè)訓(xùn)練實(shí)例。我們的示例將使用50000個(gè)實(shí)例，其中使用35,000個(gè)實(shí)例來(lái)訓(xùn)練分類器，并使用15,000個(gè)實(shí)例來(lái)測(cè)試分類器的性能:
#Load dataset as pandas data frame

data = read_csv('train.csv')

#Extract attribute names from the data frame

feat = data.keys()

feat_labels = feat.get_values()

#Extract data values from the data frame

dataset = data.values

#Shuffle the dataset

np.random.shuffle(dataset)

#We will select 50000 instances to train the classifier

inst = 50000

#Extract 50000 instances from the dataset

dataset = dataset[0:inst,:]

#Create Training and Testing data for performance evaluation

train,test = getTrainTestData(dataset, 0.7)

#Split data into input and output variable with selected features

Xtrain = train[:,0:94] ytrain = train[:,94] shape = np.shape(Xtrain)

print("Shape of the dataset ",shape)

#Print the size of Data in MBs

print("Size of Data set before feature selection: %.2f MB"%(Xtrain.nbytes/1e6))

注意下這里的數(shù)據(jù)大?。挥捎谖覀兊臄?shù)據(jù)集包含約35000個(gè)訓(xùn)練實(shí)例，帶有94個(gè)參數(shù)；我們的數(shù)據(jù)集非常大。讓我們來(lái)看一下：

Shape of the dataset (35000, 94)

Size of Data set before feature selection: 26.32 MB

如你所見(jiàn)，我們的數(shù)據(jù)集中有35000行和94列，數(shù)據(jù)大小超過(guò)26MB。

在下一個(gè)代碼塊中，我們將配置我們的隨機(jī)森林分類器；我們會(huì)使用250棵樹(shù)，最大深度為30，隨機(jī)特征的數(shù)量為7。其他超參數(shù)將是sklearn的默認(rèn)值:
#Lets select the test data for model evaluation purpose

Xtest = test[:,0:94] ytest = test[:,94]

#Create a random forest classifier with the following Parameters

trees = 250

max_feat = 7

max_depth = 30

min_sample = 2

clf = RandomForestClassifier(n_estimators=trees,

max_features=max_feat,

max_depth=max_depth,

min_samples_split= min_sample, random_state=0,

n_jobs=-1)

#Train the classifier and calculate the training time

import time

start = time.time() clf.fit(Xtrain, ytrain) end = time.time()

#Lets Note down the model training time

print("Execution time for building the Tree is: %f"%(float(end)- float(start)))

pre = clf.predict(Xtest)

Let's see how much time is required to train the model on the training dataset:

Execution time for building the Tree is: 2.913641

#Evaluate the model performance for the test data

acc = getAccuracy(pre, ytest)

print("Accuracy of model before feature selection is %.2f"%(100*acc))

模型的精確度是：

Accuracy of model before feature selection is 98.82

正如所看到的，我們獲得了非常好的精確度，因?yàn)槲覀儗缀?9%的測(cè)試數(shù)據(jù)分類為正確的類別。這意味著我們?cè)?5,000個(gè)實(shí)例中對(duì)大概14,823個(gè)實(shí)例進(jìn)行了正確的分類。

所以，現(xiàn)在問(wèn)題是：我們應(yīng)該進(jìn)一步改進(jìn)嗎？好吧，為什么不呢？如果可能的話，我們一定需要進(jìn)行更多的改進(jìn)；在這里，我們將使用特征重要度來(lái)選擇特征。如你所知，在樹(shù)的建造過(guò)程中，我們使用雜質(zhì)度量來(lái)選擇節(jié)點(diǎn)。選擇雜質(zhì)最少的參數(shù)值作為樹(shù)中的節(jié)點(diǎn)。我們可以使用類似的標(biāo)準(zhǔn)來(lái)選擇特征。我們可以給雜質(zhì)更少的特征更多的重要度，這可以使用sklearn庫(kù)的feature_importances_函數(shù)來(lái)實(shí)現(xiàn)。讓我們來(lái)看一下每個(gè)特征的重要度:

print(feature)

('id', 0.33346650420175183)

('feat_1', 0.0036186958628801214)

('feat_2', 0.0037243050888530957)

('feat_3', 0.011579217472062748)

('feat_4', 0.010297382675187445)

('feat_5', 0.0010359139416194116)

('feat_6', 0.00038171336038056165)

('feat_7', 0.0024867672489765021)

('feat_8', 0.0096689721610546085)

('feat_9', 0.007906150362995093)

('feat_10', 0.0022342480802130366)

正如你看到的，每個(gè)特征都有不同的重要度，這取決于它對(duì)最終預(yù)測(cè)的貢獻(xiàn)值。

我們將使用這些重要度評(píng)分來(lái)對(duì)我們的特征進(jìn)行排序;在接下來(lái)的部分中，我們將選取特征重要度大于0.01的特征進(jìn)行模型訓(xùn)練：
#Select features which have higher contribution in the final prediction

sfm = SelectFromModel(clf, threshold=0.01) sfm.fit(Xtrain,ytrain)

這里，我們將根據(jù)所選的特征參數(shù)轉(zhuǎn)換輸入的數(shù)據(jù)集。在下一個(gè)代碼塊中，我們會(huì)轉(zhuǎn)換數(shù)據(jù)集。然后，我們將檢查新數(shù)據(jù)集的大小和形狀:
#Transform input dataset

Xtrain_1 = sfm.transform(Xtrain) Xtest_1 = sfm.transform(Xtest)

#Let's see the size and shape of new dataset print("Size of Data set before feature selection: %.2f MB"%(Xtrain_1.nbytes/1e6))

shape = np.shape(Xtrain_1)

print("Shape of the dataset ",shape)

Size of Data set before feature selection: 5.60 MB Shape of the dataset (35000, 20)

看到數(shù)據(jù)集的形狀了嗎？經(jīng)過(guò)特征選擇后，我們只剩下20個(gè)特征，這使得數(shù)據(jù)庫(kù)的大小從26MB減少到了5.60 MB，比原來(lái)的數(shù)據(jù)集減少了80%左右。

在下一個(gè)代碼塊中，我們將使用與前面相同的超參數(shù)訓(xùn)練一個(gè)新的隨機(jī)森林分類器，并在測(cè)試集上進(jìn)行了測(cè)試。我們來(lái)看看修改訓(xùn)練集后得到的精確度是多少：

#Model training time

start = time.time() clf.fit(Xtrain_1, ytrain) end = time.time()

print("Execution time for building the Tree is: %f"%(float(end)- float(start)))

#Let's evaluate the model on test data

pre = clf.predict(Xtest_1) count = 0

acc2 = getAccuracy(pre, ytest)

print("Accuracy after feature selection %.2f"%(100*acc2))

Execution time for building the Tree is: 1.711518 Accuracy after feature selection 99.97

看到了嗎！使用修改后的數(shù)據(jù)集，我們獲得了99.97%的準(zhǔn)確率，這意味著我們把14,996個(gè)實(shí)例分到了正確的類別，而之前我們只正確地分類了14,823個(gè)實(shí)例。

這是我們?cè)谔卣鬟x擇過(guò)程中取得的巨大進(jìn)步；我們可以將所有的結(jié)果總結(jié)如下表：

評(píng)估標(biāo)準(zhǔn)	特征選擇前	特征選擇后
特征數(shù)量	94	20
數(shù)據(jù)集大小	26.32MB	5.60MB
訓(xùn)練時(shí)間	2.91s	1.71s
精確度	98.82%	99.97%

上表顯示了特征選擇的實(shí)際優(yōu)勢(shì)?？梢钥吹轿覀冿@著地減少了特征的數(shù)量，這減少了模型的復(fù)雜性和數(shù)據(jù)集的維度。在減小維度后，我們需要更少的訓(xùn)練時(shí)間，最終我們克服了過(guò)擬合的問(wèn)題，獲得了比以前更高的精確度。

本文我們共探討了機(jī)器學(xué)習(xí)中特征選擇的4種方法。

編輯：hfy

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問(wèn)題，請(qǐng)聯(lián)系本站處理。舉報(bào)投訴

機(jī)器學(xué)習(xí)

機(jī)器學(xué)習(xí)

+關(guān)注

關(guān)注
66

文章
8382

瀏覽量
132444

評(píng)論

相關(guān)推薦

五種先進(jìn)的SSD故障預(yù)測(cè)特征選擇方法盤點(diǎn)

本文比較了沒(méi)有特征選擇（即使用所有學(xué)習(xí)特征）和五種最先進(jìn)的特征

發(fā)表于 07-12 09:09 ?1612次閱讀

五<b class='flag-5'>種</b>先進(jìn)的SSD故障預(yù)測(cè)<b class='flag-5'>特征</b><b class='flag-5'>選擇</b><b class='flag-5'>方法</b>盤點(diǎn)

如何選擇機(jī)器學(xué)習(xí)的各種方法

的這篇博客，講述了如何選擇機(jī)器學(xué)習(xí)的各種方法。另外，Scikit-learn 也提供了一幅清晰的路線圖給大家選擇：其實(shí)機(jī)器

發(fā)表于 03-07 20:18

軟體機(jī)器人學(xué)習(xí)問(wèn)題探討

以軟體機(jī)器人為背景和主題，深入講解：(1) 軟體機(jī)器人的關(guān)節(jié)設(shè)計(jì)方法；(2) 有限元分析技巧；(3) 力學(xué)模型的建立方法； (4) 基于MA

發(fā)表于 08-12 15:09

初學(xué)機(jī)器學(xué)習(xí)的四種方法介紹

學(xué)習(xí)機(jī)器學(xué)習(xí)有很多方法，大多數(shù)人選擇從理論開(kāi)始。如果你是個(gè)程序員，那么你已經(jīng)掌握了把問(wèn)題拆分成相應(yīng)組成部分及設(shè)計(jì)小項(xiàng)目原型的能力，這些能力能幫助你

發(fā)表于 07-05 08:34 ?2801次閱讀

機(jī)器學(xué)習(xí)特征選擇常用算法

) ，是指從全部特征中選取一個(gè)特征子集，使構(gòu)造出來(lái)的模型更好。在機(jī)器學(xué)習(xí)的實(shí)際應(yīng)用中，特征數(shù)量

發(fā)表于 11-16 01:28 ?8523次閱讀

克隆代碼有害性預(yù)測(cè)中的特征選擇模型

的特征并去除其他無(wú)關(guān)特征，減小特征的搜索空間；接著，采用基于樸素貝葉斯等六種分類器分別與封裝型序列浮動(dòng)前向選擇算法結(jié)合來(lái)確定最優(yōu)

發(fā)表于 12-04 10:09 ?0次下載

機(jī)器學(xué)習(xí)中的特征選擇的5點(diǎn)詳細(xì)資料概述

特征選擇是一個(gè)重要的“數(shù)據(jù)預(yù)處理” (data preprocessing) 過(guò)程，在現(xiàn)實(shí)機(jī)器學(xué)習(xí)任務(wù)中，獲得數(shù)據(jù)之后通常先進(jìn)行

發(fā)表于 06-18 17:24 ?6991次閱讀

機(jī)器學(xué)習(xí)特征選擇的三種方法

在一定程度上降低特征后，從直觀上來(lái)看，很多時(shí)候可以一目了然看到特征與特征值之間的關(guān)聯(lián)，這個(gè)場(chǎng)景，需要實(shí)際業(yè)務(wù)的支撐，生產(chǎn)上的業(yè)務(wù)數(shù)據(jù)更加明顯，有興趣的同學(xué)可以私信我加群，一起研究。

發(fā)表于 04-15 15:56 ?1.5w次閱讀

機(jī)器學(xué)習(xí)如何進(jìn)行特征選擇

想要找一個(gè)最好的特征子集，最簡(jiǎn)單最笨的方法就是把所有的特征排列組合，遍歷每一個(gè)子集從中選擇里面最好的一個(gè)，這種方法必然不可取。對(duì)這

發(fā)表于 05-20 08:00 ?0次下載

機(jī)器學(xué)習(xí)之特征提取 VS 特征選擇

機(jī)器學(xué)習(xí)中特征選擇和特征提取區(qū)別 demi 在周四, 06/11/2020 - 16:08 提

發(fā)表于 09-14 16:23 ?4081次閱讀

基于最大信息系數(shù)與冗余分?jǐn)偛呗缘?b class='flag-5'>特征選擇方法

特征選擇是機(jī)器學(xué)習(xí)的關(guān)鍵環(huán)節(jié)，通常采用最小冗余最大相關(guān)法進(jìn)行特征選擇，但該

發(fā)表于 03-26 15:27 ?13次下載

基于最大信息系數(shù)與冗余分?jǐn)偛呗缘?b class='flag-5'>特征</b><b class='flag-5'>選擇</b><b class='flag-5'>方法</b>

特征選擇和機(jī)器學(xué)習(xí)的軟件缺陷跟蹤系統(tǒng)對(duì)比

軟件缺陷報(bào)告嚴(yán)重程度。通過(guò)對(duì)4種特征選擇算法及4種機(jī)器

發(fā)表于 06-10 10:50 ?12次下載

通過(guò)強(qiáng)化學(xué)習(xí)策略進(jìn)行特征選擇

來(lái)源：DeepHubIMBA特征選擇是構(gòu)建機(jī)器學(xué)習(xí)模型過(guò)程中的決定性步驟。為模型和我們想要完成的任務(wù)選擇

發(fā)表于 06-05 08:27 ?326次閱讀

人臉檢測(cè)的五種方法各有什么特征和優(yōu)缺點(diǎn)

人臉檢測(cè)是計(jì)算機(jī)視覺(jué)領(lǐng)域的一個(gè)重要研究方向，主要用于識(shí)別和定位圖像中的人臉。以下是五種常見(jiàn)的人臉檢測(cè)方法及其特征和優(yōu)缺點(diǎn)的介紹：基于膚色的方法

發(fā)表于 07-03 14:47 ?686次閱讀

機(jī)器學(xué)習(xí)中的數(shù)據(jù)預(yù)處理與特征工程

在機(jī)器學(xué)習(xí)的整個(gè)流程中，數(shù)據(jù)預(yù)處理與特征工程是兩個(gè)至關(guān)重要的步驟。它們直接決定了模型的輸入質(zhì)量，進(jìn)而影響模型的訓(xùn)練效果和泛化能力。本文將從數(shù)據(jù)預(yù)處理和

發(fā)表于 07-09 15:57 ?309次閱讀

精品国产人成在线_亚洲高清无码在线观看_国产在线视频国产永久2021_国产AV综合第一页一个的一区免费影院黑人_最近中文字幕MV高清在线视频

搜索歷史

探討機(jī)器學(xué)習(xí)中特征選擇的4種方法

評(píng)論

五種先進(jìn)的SSD故障預(yù)測(cè)特征選擇方法盤點(diǎn)

如何選擇機(jī)器學(xué)習(xí)的各種方法

軟體機(jī)器人學(xué)習(xí)問(wèn)題探討

初學(xué)機(jī)器學(xué)習(xí)的四種方法介紹

機(jī)器學(xué)習(xí)特征選擇常用算法

克隆代碼有害性預(yù)測(cè)中的特征選擇模型

機(jī)器學(xué)習(xí)中的特征選擇的5點(diǎn)詳細(xì)資料概述

機(jī)器學(xué)習(xí)特征選擇的三種方法

機(jī)器學(xué)習(xí)如何進(jìn)行特征選擇

機(jī)器學(xué)習(xí)之特征提取 VS 特征選擇

基于最大信息系數(shù)與冗余分?jǐn)偛呗缘?b class='flag-5'>特征選擇方法

特征選擇和機(jī)器學(xué)習(xí)的軟件缺陷跟蹤系統(tǒng)對(duì)比

通過(guò)強(qiáng)化學(xué)習(xí)策略進(jìn)行特征選擇

人臉檢測(cè)的五種方法各有什么特征和優(yōu)缺點(diǎn)

機(jī)器學(xué)習(xí)中的數(shù)據(jù)預(yù)處理與特征工程