Pilotcore Insights
Data & ML

Creating a custom XCom backend in Airflow

How to store larger Airflow XCom payloads in S3 while keeping DAG task interfaces small and explicit.

Pilotcore By Pilotcore Reviewed May 19, 2026 3 min read

Need Help With Data & ML?

Our experts can help you implement these strategies in your organisation. Get a free consultation today.

Reviewed May 20, 2026. Custom XCom backends are still supported in Apache Airflow, but method signatures and serialization behavior are version-sensitive. Treat the code as illustrative and test it against the Airflow version you run.

An Airflow custom XCom backend is a Python class that changes how task-to-task messages are serialized, stored, and loaded. It is useful when default metadata-database XCom storage is too small or too operationally risky for selected payload types.

Main takeaways

  • Keep default XCom values small; use object storage for selected larger payloads.
  • Store references in XCom rather than full DataFrames, files, or bulky serialized objects.
  • Version-test custom XCom code against your Airflow runtime before production use because signatures can change.

For related context, see DevOps Pipeline and DevSecOps Consulting.

In Airflow, XComs let tasks exchange small pieces of data between runs. Default XCom storage is not a good fit for larger payloads. Limits vary by metadata database backend, and pushing bigger objects directly into XCom storage can create operational problems.

Use this pattern when payload size is too large for default XCom storage but still small enough to serialize and retrieve safely per task run.

version note

The concept is still valid, but the code below is illustrative. Airflow supports custom XCom backends by subclassing BaseXCom, but import paths and method signatures can differ across Airflow releases. Version-test this pattern against your current Airflow 3.x runtime before using it in production.

Why this pattern helps

Instead of storing full payloads in XCom tables, you can:

  • Serialize task output to object storage (for example, S3)
  • Store only an object reference in XCom
  • Deserialize downstream when needed

This keeps task boundaries explicit and avoids bloating Airflow metadata storage.

Custom backend approach

Airflow supports custom XCom backends through BaseXCom subclassing. The idea is:

  1. Override serialize_value to intercept selected object types.
  2. Upload serialized data to S3.
  3. Return a custom URI in XCom (for example, xcom_s3://bucket/key).
  4. Override deserialize_value to fetch from S3 when that URI is encountered.

Example implementation

import os
import uuid
import pandas as pd

from typing import Any
from airflow.models.xcom import BaseXCom
from airflow.providers.amazon.aws.hooks.s3 import S3Hook


class S3XComBackend(BaseXCom):
   PREFIX = "xcom_s3"
   BUCKET_NAME = os.environ.get("S3_XCOM_BUCKET_NAME")

   @staticmethod
   def _assert_s3_backend():
       if S3XComBackend.BUCKET_NAME is None:
           raise ValueError("Unknown bucket for S3 backend.")

   @staticmethod
   def serialize_value(value: Any):
       if isinstance(value, pd.DataFrame):
           S3XComBackend._assert_s3_backend()
           hook = S3Hook()
           key = f"data_{uuid.uuid4()}.csv"
           filename = key
           value.to_csv(filename, index=False)
           hook.load_file(
               filename=filename,
               key=key,
               bucket_name=S3XComBackend.BUCKET_NAME,
               replace=True
           )
           value = f"{S3XComBackend.PREFIX}://{S3XComBackend.BUCKET_NAME}/{key}"

       return BaseXCom.serialize_value(value)

   @staticmethod
   def deserialize_value(result) -> Any:
       result = BaseXCom.deserialize_value(result)

       if isinstance(result, str) and result.startswith(S3XComBackend.PREFIX):
           S3XComBackend._assert_s3_backend()
           hook = S3Hook()
           key = result.replace(f"{S3XComBackend.PREFIX}://{S3XComBackend.BUCKET_NAME}/", "")
           filename = hook.download_file(
               key=key,
               bucket_name=S3XComBackend.BUCKET_NAME,
               local_path="/tmp"
           )
           result = pd.read_csv(filename)

       return result

Airflow configuration

Set AIRFLOW__CORE__XCOM_BACKEND to your backend class path, for example:

  • xcom_s3_backend.S3XComBackend

Ensure the module is available in PYTHONPATH and S3_XCOM_BUCKET_NAME is set in the runtime environment.

Operational cautions

  • Add lifecycle cleanup for generated objects to avoid unbounded S3 growth.
  • Scope bucket permissions tightly so task roles can access only required keys.
  • Validate serialization format consistency between write and read paths.
  • Include run identifiers in keys if cross-run collisions are possible.

Object storage-backed XCom values should still be treated as transient task exchange, not a long-term data lake interface.

Next step

Ready to get started?

Choose how you'd like to begin your engagement with Pilotcore.

Full engagement

Full consultation

Discuss your complete cloud and security strategy with the principal consultant. For comprehensive transformations and multi-quarter engagements.

Recommended start

Start with a pilot

Test the engagement with a focused 1-4 week scope. See real results, on a fixed timeline, before committing to anything larger.