AWSPythonGlueJob未将数字列导入RDS数据库。 _编程开发

AWSPythonGlueJob未将数字列导入RDS数据库。

创始人

2024-11-19 11:00:53

0次

确保源数据中的数字列的数据类型正确。如果数据类型不正确，例如数字列被指定为字符串列，则Glue Job可能无法正确导入数据。在这种情况下，您需要在转换或写入数据之前将其类型更改为数字类型。以下是示例代码，将“number_col”列转换为整数（Int）类型：

import pyspark.sql.functions as f
from pyspark.sql.types import IntegerType

df = spark.read.format("csv").option("header", "true").load("s3://path/to/input")
df = df.withColumn("number_col", f.col("number_col").cast(IntegerType()))
df.write.format("jdbc").option("url", "jdbc:").option("dbtable", "").option("user", "") \
    .option("password", "").option("driver", "org.postgresql.Driver") \
    .mode("overwrite").save()

在Glue Job中使用显式模式定义RDS数据库表的Schema。这将确保您需要的每个列都正确地映射到RDS表中的列。以下是示例代码：

from awsglue.dynamicframe import DynamicFrame
from awsglue.context import GlueContext
from pyspark.context import SparkContext
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType

sc = SparkContext.getOrCreate()
glue_context = GlueContext(sc)
spark = glue_context.spark_session

db_name = ""
tbl_name = ""
rds_url = ""
username = ""
password = ""

# Define schema for RDS table
schema = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("number_col", IntegerType(), True),
    StructField("date_col", LongType(), True),
    StructField("timestamp_col", LongType(), True)
])

# Read data from source (

上一篇：AWSPythonCDKv2在CI/CD流程中授予Lambda权限失败。

下一篇：AWSPythonGlue作业无法将数字列导入RDS。

AWSPythonGlueJob未将数字列导入RDS数据库。

相关内容

热门资讯