Exclusive — Airflow Xcom

@task(retries=0) def fetch_transactions(**context): df = query_db() # Push allowed only to key "raw_txns" context["ti"].xcom_push(key="raw_txns", value=df.to_json()) return "done"

: Because XComs live in your metadata database (like Postgres), they are typically limited to 1 GB .

export AIRFLOW__COMMON_IO__XCOM_OBJECTSTORAGE_PATH='s3://aws_default@my-airflow-bucket/xcoms/' airflow xcom exclusive

my_data_pipeline()

But this flexibility comes at a cost. In large-scale data pipelines, the default XCom behavior can lead to bloated metadata databases, security vulnerabilities, race conditions, and debugging nightmares. To handle data more strictly or exclusively beyond

To handle data more strictly or exclusively beyond the default local database, Airflow provides several advanced mechanisms: 1. Custom XCom Backends

By default, XCom data is serialized and stored directly inside the xcom table of your Airflow metadata database. While this works well for development, it comes

By default, Airflow uses the metadata database to store XComs via the BaseXCom class. While this works well for development, it comes with significant limitations in production. The most notable constraint is the 48KB size limit for stored values. Furthermore, the default backend can become a bottleneck when dealing with a large number of XComs, slowing down the database and, by extension, the entire scheduler. These limitations form the primary reason for seeking a more "exclusive" and robust solution.

Use these strategies depending on your requirement:

xcom_objectstorage_threshold : The size threshold for switching backends. 5. Troubleshooting XComs in the UI

Pandastorm Pictures GmbH