在Pyspark中,可以使用filter()函数根据条件将DataFrame拆分为多个DataFrame。以下是一个示例代码:
from pyspark.sql import SparkSession
# 创建SparkSession
spark = SparkSession.builder.getOrCreate()
# 创建示例DataFrame
data = [("Alice", 25, "Female"),
("Bob", 30, "Male"),
("Charlie", 35, "Male"),
("David", 40, "Male"),
("Eve", 45, "Female")]
df = spark.createDataFrame(data, ["Name", "Age", "Gender"])
# 按条件将DataFrame拆分为多个DataFrame
male_df = df.filter(df.Gender == "Male")
female_df = df.filter(df.Gender == "Female")
# 打印结果
print("Male DataFrame:")
male_df.show()
print("Female DataFrame:")
female_df.show()
输出结果:
Male DataFrame:
+-------+---+------+
| Name|Age|Gender|
+-------+---+------+
| Bob| 30| Male|
|Charlie| 35| Male|
| David| 40| Male|
+-------+---+------+
Female DataFrame:
+-----+---+------+
| Name|Age|Gender|
+-----+---+------+
|Alice| 25|Female|
| Eve| 45|Female|
+-----+---+------+
在上述代码中,我们首先创建了一个示例DataFrame,并使用filter()
函数将其拆分为两个DataFrame,分别是male_df
和female_df
。我们通过指定条件df.Gender == "Male"
和df.Gender == "Female"
来进行拆分。最后,我们使用show()
函数打印了两个DataFrame的内容。
下一篇:按条件将日志写入文件的Nlog