Python Code
#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import linear_model
import numpy as np
#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
# x_train=input_variables_values_training_datasets
x_train=np.random.rand(4,4)
print(x_train)
# y_train=target_variables_values_training_datasets
y_train=np.random.rand(4,4)
print(y_train)
# x_test=input_variables_values_test_datasets
x_test=np.random.rand(4,4)
print(x_test)
# Create linear regression object
linear = linear_model.LinearRegression()
# Train the model using the training sets and check score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)
#Equation coefficient and Intercept
print('Coefficient:
', linear.coef_)
print('Intercept:
', linear.intercept_)
#Predict Output
predicted= linear.predict(x_test)
print('predicted:
',predicted)
[[ 0.98267731 0.23364069 0.35133775 0.92826309]
[ 0.80538991 0.05637806 0.87662175 0.3960776 ]
[ 0.54686738 0.6816495 0.99747716 0.32531085]
[ 0.19189509 0.87105462 0.88158122 0.25056621]]
[[ 0.55541608 0.56859636 0.40616234 0.14683524]
[ 0.09937835 0.63874553 0.92062536 0.32798326]
[ 0.87174236 0.779044 0.79119392 0.06912842]
[ 0.87907434 0.53175367 0.01371655 0.11414196]]
[[ 0.37568516 0.17267374 0.51647046 0.04774661]
[ 0.38573914 0.85335136 0.11647555 0.0758696 ]
[ 0.67559384 0.57535368 0.88579261 0.26278658]
[ 0.13829782 0.28328756 0.51170484 0.04260013]]
Coefficient:
[[ 0.55158868 1.45901817 0.31224322 0.49538173]
[ 0.6995448 0.40804135 0.59938423 0.09084578]
[ 1.79010371 0.21674532 1.60972012 -0.046387 ]
[-0.31562917 -0.53767439 -0.16141312 -0.2154683 ]]
Intercept:
[-0.89705102 -0.50908061 -1.9260686 0.83934127]
predicted:
[[-0.25297601 0.13808785 -0.38696891 0.53426883]
[ 0.63472658 0.18566989 -0.86662193 0.22361739]
[ 0.72181277 0.75309881 0.82170796 0.11715048]
[-0.22656611 0.01383581 -0.79537442 0.55159912]]
R Code
#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train <- input_variables_values_training_datasets
y_train <- target_variables_values_training_datasets
x_test <- input_variables_values_test_datasets
x <- cbind(x_train,y_train)
# Train the model using the training sets and check score
linear <- lm(y_train ~ ., data = x)
summary(linear)
#Predict Output
predicted= predict(linear,x_test)
2.邏輯回歸
不要因為它的名字而感到困惑,邏輯回歸是一個分類算法而不是回歸算法。它用于基于給定的一組自變量來估計離散值(二進制值,如0/1,是/否,真/假)。簡單來說,它通過將數據擬合到logit函數來預測事件發生的概率。因此,它也被稱為logit回歸。由于它預測概率,其輸出值在0和1之間(如預期的那樣)。
再次,讓我們通過一個簡單的例子來嘗試理解這一點。
假設你的朋友給你一個難題解決。只有2個結果場景 - 你能解決和不能解決。現在想象,你正在被許多猜謎或者簡單測驗,來試圖理解你擅長的科目。這項研究的結果將是這樣的結果 - 如果給你一個10級的三角形問題,那么你有70%可能會解決這個問題。另外一個例子,如果是五級的歷史問題,得到答案的概率只有30%。這就是邏輯回歸為你提供的結果。
對數學而言,結果的對數幾率被建模為預測變量的線性組合。
odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence ln(odds) = ln(p/(1-p)) logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
以上,p是感興趣特征的概率。 它選擇最大化觀察樣本值的可能性的參數,而不是最小化平方誤差的總和(如在普通回歸中)。
現在,你可能會問,為什么要采用log? 為了簡單起見,讓我們來說,這是復制階梯函數的最好的數學方法之一。 我可以進一步詳細介紹,但這將會打破這篇文章的目的。
?
Python Code
#Import Library
from sklearn.linear_model import LogisticRegression
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create logistic regression object
model = LogisticRegression()
# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)
#Equation coefficient and Intercept
print('Coefficient:
', model.coef_)
print('Intercept:
', model.intercept_)
#Predict Output
predicted= model.predict(x_test)
評論
查看更多