确认AWS WAF日志分区是否存在于S3桶中 在AWS控制台中,找到对应S3桶,并确认AWS WAF日志分区是否存在于桶中。
更新AWS Glue元存储 查询日志分区的信息是否正确,然后通过AWS Glue控制台更新元存储。
更新AWS Glue crawler 如果AWS Glue元存储中没有包含AWS WAF日志分区,则需要使用AWS Glue crawler更新元存储。使用AWS Glue console创建新的crawler,将其配置为扫描日志分区所在的桶,并运行crawler以更新元存储。
以下是更新AWS Glue元数据的Python示例代码:
import boto3
import json
client = boto3.client('glue', region_name='your-region')
database_name = 'your-database-name'
table_name = 'your-table-name'
location = 's3://your-bucket-name/your-partition-path'
table_input = {
'Name': table_name,
'TableType': 'EXTERNAL_TABLE',
'Parameters': {
'classification': 'aws-waf',
'compressionType': 'gzip'
},
'StorageDescriptor': {
'Columns': [
{"Name":"col1","Type":"string","Comment":""},
{"Name":"col2","Type":"string","Comment":""},
{"Name":"col3","Type":"string","Comment":""},
{"Name":"col4","Type":"string","Comment":""},
{"Name":"col5","Type":"string","Comment":""},
{"Name":"col6","Type":"string","Comment":""},
{"Name":"col7","Type":"string","Comment":""},
{"Name":"col8","Type":"string","Comment":""}
],
'Location': location,
'InputFormat': 'org.apache.hadoop.mapred.TextInputFormat',
'OutputFormat': 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat',
'Compressed': 'true',
'NumberOfBuckets': -1,
'SerdeInfo': {
'SerializationLibrary': 'org.openx.data.jsonserde.JsonSerDe',
'Parameters': {
'serialization.format': '1'
}
},
'BucketColumns': [],
'SortColumns': [],
'Parameters': {
'classification': 'aws-waf',
'compressionType': 'gzip'
},
'StoredAsSubDirectories': False
}
}
partition_input = {
'Values': ['your-partition