Running Apache Airflow in production often requires careful management of dependencies, configuration, and isolation. Docker Compose offers a streamlined approach to local development and testing by defining multi-container environments in a declarative YAML file. This setup allows teams to replicate complex production-like infrastructures on a single machine without the overhead of manual installation.
Why Combine Airflow with Docker Compose
The synergy between Airflow and Docker Compose lies in consistency and speed. Developers can spin up identical environments across machines, eliminating the classic "it works on my machine" problem. Each component—the scheduler, webserver, metadata database, and workers—runs in its own container, mirroring microservices architecture.
Core Components in a Standard Setup
A typical `docker-compose.yml` for Airflow includes several essential services. The primary elements are the PostgreSQL database for metadata storage, the Redis instance for task queuing, the Airflow scheduler for triggering tasks, and the webserver for UI interaction. Optional services like Flower for Celery monitoring or custom plugins can also be integrated.
Database and Message Broker
PostgreSQL and Redis are the backbone of the Airflow cluster. The database stores DAG definitions, task instances, and connection details, while Redis manages the distributed task queue. Using dedicated containers for these services ensures isolation and simplifies backup and recovery procedures.
Configuration Best Practices
Environment variables play a crucial role in connecting containers and configuring Airflow. The `AIRFLOW__CORE__SQL_ALCHEMY_CONN` variable points to the database, while `AIRFLOW__CORE__EXECUTOR` determines how tasks are processed. Leveraging `.env` files keeps sensitive data separate from the compose file and enhances portability.
Volume Mounting for Persistence
Mapping local directories to container paths ensures DAGs and configuration files persist beyond container lifecycles. This approach allows developers to edit code locally and see changes reflected immediately in the running Airflow instance without rebuilding images constantly.
Performance Considerations and Scaling
While a single-node setup is ideal for development, scaling requires adjusting the number of scheduler and worker instances. Docker Compose allows easy scaling of specific services with the `--scale` flag, enabling load testing and performance tuning in a controlled environment.
Security and Network Isolation
Defining custom networks in Docker Compose restricts communication to only necessary services, reducing the attack surface. Using secrets for database passwords and setting up proper user permissions within containers further hardens the environment against potential vulnerabilities.