以下是一个使用不同的交叉验证技术并产生相同评估指标的示例代码:
import numpy as np
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold, cross_val_score
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
# 加载数据集
iris = load_iris()
X = iris.data
y = iris.target
# 使用train_test_split进行简单交叉验证
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
print("简单交叉验证准确率:", accuracy)
# 使用KFold进行k折交叉验证
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
accuracies = []
for train_index, test_index in kfold.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model = LogisticRegression()
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
accuracies.append(accuracy)
print("k折交叉验证准确率:", np.mean(accuracies))
# 使用StratifiedKFold进行分层k折交叉验证
stratified_kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
accuracies = []
for train_index, test_index in stratified_kfold.split(X, y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model = LogisticRegression()
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
accuracies.append(accuracy)
print("分层k折交叉验证准确率:", np.mean(accuracies))
# 使用cross_val_score进行交叉验证
model = LogisticRegression()
scores = cross_val_score(model, X, y, cv=5)
print("交叉验证准确率:", np.mean(scores))
这个示例代码使用了sklearn库,首先使用train_test_split
函数进行简单交叉验证,然后使用KFold
进行k折交叉验证,接着使用StratifiedKFold
进行分层k折交叉验证,最后使用cross_val_score
函数进行交叉验证。在每种交叉验证技术中,都使用逻辑回归模型对数据进行训练和评估,最后计算平均准确率。通过这些不同的交叉验证技术,我们可以得到相同的评估指标,即准确率。