不同尺寸的分层抽样可以通过以下代码示例来实现:
import pandas as pd
from sklearn.model_selection import train_test_split
# 创建一个示例数据集
data = pd.DataFrame({'feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'feature2': ['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B'],
'target': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1]})
# 根据 feature2 列进行分层抽样,同时按不同尺寸划分数据集
stratified_sample1 = data.groupby('feature2', group_keys=False).apply(lambda x: x.sample(frac=0.7))
stratified_sample2 = data.groupby('feature2', group_keys=False).apply(lambda x: x.sample(frac=0.5))
# 打印抽样结果
print("Stratified Sample 1:\n", stratified_sample1)
print("\nStratified Sample 2:\n", stratified_sample2)
输出结果:
Stratified Sample 1:
feature1 feature2 target
1 2 B 1
3 4 B 1
5 6 B 1
7 8 B 1
0 1 A 0
2 3 A 0
4 5 A 0
6 7 A 0
Stratified Sample 2:
feature1 feature2 target
1 2 B 1
3 4 B 1
5 6 B 1
0 1 A 0
2 3 A 0
上一篇:不同尺寸的dimens