该错误是由于在使用DataFlowRunner时未指定服务账号导致的。需要在BeamRunPythonPipelineOperator的参数中添加“options”参数,并设置“service_account_email”属性为已授权执行任务的服务账号。以下是示例代码:
from airflow.providers.google.cloud.operators.dataflow import DataflowCreateJavaJobOperator, DataflowStartJavaJobOperator
from airflow.utils.dates import days_ago
yesterday = days_ago(1)
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': yesterday,
'email_on_failure': False,
'email_on_retry': False,
'retries': 0
}
options={
'runner': 'DataflowRunner',
'project': 'your-project-id',
'staging_location': 'gs://your/staging/location',
'temp_location': 'gs://your/temp/location',
'region': 'your-region-1',
'service_account_email': 'your-service-account-email',
}
with DAG(
'your_dag_id',
default_args=default_args,
schedule_interval=None,
) as dag:
beam_run_python_pipeline = BeamRunPythonPipelineOperator(
task_id='your_task_id',
python_callable=your_python_function,
options=options
)