- 数据准备:从数据集中拆分出训练集与测试集,例如:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- 定义批量梯度下降函数,例如:
def batch_gradient_descent(X, y, learning_rate=0.01, num_iterations=1000):
m = len(y)
theta = np.zeros((X.shape[1], 1))
for i in range(num_iterations):
y_pred = X.dot(theta)
loss = y_pred - y
gradient = X.T.dot(loss) / m
theta -= learning_rate * gradient
return theta
- 分别使用训练集与测试集进行模型训练与预测,并计算误差,例如:
train_theta = batch_gradient_descent(X_train, y_train)
train_error = mean_squared_error(y_train, X_train.dot(train_theta))
test_error = mean_squared_error(y_test, X_test.dot(train_theta))
- 对比训练误差与测试误差的变化趋势,在学习率、迭代次数等参数的调整过程中观察其变化规律,以便优化模型。