Those resources and services are not maintained, nor endorsed by the Apache Airflow Community and Apache Airflow project (maintained by the Committers and the Airflow PMC). Use them at your sole discretion. The community does not verify the licences nor validity of those tools, so it’s your responsibility to verify them.
- Udemy Apache Airflow Pro
- Udemy Apache Airflow Software
- Udemy Apache Airflow Tutorial
- Udemy Apache Airflow 2.0
If you would you like to be included on this page, please reach out to the Apache Airflow dev or user mailing list and let us know or simply open a Pull Request to that page.
Apache Airflow YouTube Channel - Official YouTube Channel
Apache Airflow is a platform created by community to programmatically author, schedule and monitor workflows. It is scalable, dynamic, extensible and modulable. Without any doubts, mastering Airflow is becoming a must-have and an attractive skill for anyone working with data. Master Air movement like an expert! Begin with the essentials as well as copulate to producing your very own Process! Enrol This Course 'The Complete Guide to Apache Airflow ' Totally Free F. Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. If you have many ETL (s) to manage, Airflow is a must-have.
Airflow Summit - Online conference for Apache Airflow developers
Awesome Apache Airflow - Curated list of resources about Apache Airflow
The Complete Hands-On Introduction to Apache Airflow by Marc Lamberti on Udemy
Apache Airflow: Complete Hands-On Beginner to Advanced Class by Alexandra Abbas on Udemy
Airflow as a Service
Astronomer - Managed Apache Airflow in Astronomer Cloud, or self-hosted within your environment
Google Cloud Composer - Managed Apache Airflow service on Google Cloud Platform
Qubole - Managed Apache Airflow Service on all major public clouds
Amazon Managed Workflows for Apache Airflow - Managed Apache Airflow on Amazon Web Services (AWS)
Third Party Airflow Plugins and Providers
Astronomer Registry - The discovery and distribution hub for Apache Airflow integrations created to aggregate and curate the best bits of the ecosystem.
Airflow Plugins - Central collection of repositories of various plugins for Airflow, including mailchimp, trello, sftp, GitHub, etc.
Airflow ECR Plugin - Plugin to refresh AWS ECR login token at regular intervals. This is helpful where DockerOperator needs to pull images hosted on ECR.
Tools integrating with Airflow
afctl - A CLI tool that includes everything required to create, manage and deploy airflow projects faster and smoother.
airflow-aws-executors - Run Airflow Tasks directly on AWS Batch, AWS Fargate, or AWS ECS; provisioning less infra is more.
airflow-code-editor - A tool for Apache Airflow that allows you to edit DAGs in browser.
airflow-diagrams - Auto-generated Diagrams from Airflow DAGs
airflow-maintenance-dags - Clairvoyant has a repo of Airflow DAGs that operator on Airflow itself, clearing out various bits of the backing metadata store.
AirflowK8sDebugger - A library for generate k8s pod yaml templates from an Airflow dag using the KubernetesPodOperator.
Airflow Ditto - An extensible framework to do transformations to an Airflow DAG and convert it into another DAG which is flow-isomorphic with the original DAG, to be able to run it on different environments (e.g. on different clouds, or even different container frameworks - Apache Spark on YARN vs Kubernetes). Comes with out-of-the-box support for EMR-to-HDInsight-DAG transforms.
Apache-Liminal-Incubating - Liminal provides a domain-specific-language (DSL) to build ML/AI workflows on top of Apache Airflow. Its goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production.
Chartis - Python package to convert Common Workflow Language (CWL) into Airflow DAG.
CWL-Airflow - Python package to extend Apache-Airflow 1.10.11 functionality with CWL v1.2 support.
dag-factory - A library for dynamically generating Apache Airflow DAGs from YAML configuration files.
Dag Dependencies viewer - A tool which creates a view to visualize dependencies between the Airflow DAGs
Databand - Observability platform built on top of Airflow.
Udemy Apache Airflow Pro
dbt (data build tool) - Data transformation tool, dbt jobs can be scheduled using Airflow.
GeniumCloud - One-Stop-Shop Platform for rapid build, scheduling and control Airflow workflows via completely new UI. Out of the box comprehensive Airflow infrastructure monitoring, integration with alerting systems and service adoption from small to enterprise organizations. The easiest way to manage complex workflows.
gusty - Create a DAG using any number of YAML, Python, Jupyter Notebook, or R Markdown files that represent individual tasks in the DAG. gusty also configures dependencies, DAGs, and TaskGroups, features support for your local operators, and more. A fully containerized demo is available here.
Meltano - Open source, self-hosted, CLI-first, debuggable, and extensible ELT tool that embraces Singer for extraction and loading, leverages dbt for transformation, and integrates with Airflow for orchestration.
Oozie to Airflow - A tool to easily convert between Apache Oozie workflows and Apache Airflow workflows.
Pylint-Airflow - A Pylint plugin for static code analysis on Airflow code.
simple-dag-editor - Zero configuration Airflow tool that let you manage your DAG files.
Viewflow - An Airflow-based framework that allows data scientists to create data models without writing Airflow code.
whirl - Fast iterative local development and testing of Apache Airflow workflows.
Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. If you have many ETL(s) to manage, Airflow is a must-have.
In the Apache Airflow on AWS EKS: The Hands-On Guide course, you are going to learn everything you need to set up a production ready architecture on AWS EKS with Airflow and the Kubernetes Executor. Discover how to execute tasks at scale like you will do in your company.
You will find the materials directly in a video of the course
Section 1: Introduction
- Important Prerequisites
- Who I am
- Your Airflow Journey
- Overview of the architecture
- The Checklist
Section 2: Configuring AWS
- Defining a budget
- [Practice] Creating the IAM admin group
- [Practice] Create the IAM admin user
Udemy Apache Airflow Software
Section 3: Exploring the DevOps world
- Why is knowing DevOps concepts important?
- Reminder about Kubernetes
- Kubernetes Quiz
- What is IaC or Infrastructure as code?
- IaC Quiz
- Deployments with GitOps
- GitOps made simple with Flux
- GitOps Quiz
Section 4: Creating the EKS cluster with GitOps
- [Practice] Creating the cloud9 environment for the workstation
- [Practice] Configuring the workstation
- [Practice] Configuring Cloud9 with the Admin account
- [Practice] Creating the IAM role to interact with the EKS cluster
- AZs, VPCs and Subnets in AWS
- What is AWS EKS?
- [Practice] Creating and configuring the Git repository for GitOps
- [Practice] Creating a multi-node EKS cluster with EKSCTL and GitOps
- [Practice] Configuring the EKS cluster with Flux
- Namespaces in Kubernetes
- [Practice] Creating dev, staging and prod namespaces
- Clean Up
Section 5: Deploying Airflow with DAGs
- Set Up
- Deployments with Helm
- [Practice] Overview of the Airflow Helm chart
- Scaling with the Kubernetes Executor
- [Practice] Creating your first release of Airflow
- [Practice] Deploying Airflow with Flux
- Troubleshooting deployments with Flux
- Synchronizing DAGs in Kubernetes
- [Practice] Fetching DAGs with Git-Sync
- [Practice] Running DAGs with Git-Sync
- Secrets in Kubernetes
- [Practice] Fetching DAGs with Git-Sync from a private repository
- [Practice] Adding the secret in the repo
- Volumes in Kubernetes
- Introduction to AWS EFS
- [Practice] Configuring AWS EFS
- [Practice] Sharing DAGs between pods with AWS EFS
- Clean Up
Section 6: Building CI/CD pipelines to deploy Airflow
- Set Up
- What is AWS CodePipeline?
- [Practice] Building a CI/CD pipeline with CodePipeline and ECR
- [Practice] Deploying Airflow in EKS with CodePipeline and Flux
- Unit testing in Airflow
- [Practice] Unit testing your DAGs
- [Practice] Building the CI/CD pipeline in dev with unit tests
- [Practice] Integration tests for testing tasks in DAGs
- [Practice] Building the CI/CD pipeline in staging with integration tests
- [Practice] Clean up
- [Practice] Set up
- Services in Kubernetes
- Architecture with the Elastic Load Balancer
- [Practice] Exposing the Airflow UI with AWS Elastic Load Balancer
- What is an Ingress?
- Architecture with the AWS ALB Ingress controller
- [Practice] Exposing the Airflow UI with AWS ALB Ingress
- [Practice] Exposing the staging environment with AWS ALB
- Quick reminder about SSL
- [Practice] Creating a Domain for Airflow with ExternalDNS and AWS Route53
- [Practice] Activating SSL on the Airflow UI
- [Practice] Fix the AWS ALB’s health checks
- [Practice] Exporting the SSL secret object
- [Practice] Upgrading the staging environment
- [Exercise] Enabling DNS and SSL for staging
- [Practice] Creating subdomains to access the UIs of Airflow
- Clean Up
Section 8: Logging with Airflow in AWS EKS
- Set Up
- RBAC in Kubernetes
- Permission issues for accessing pod’s logs
- [Practice] Storing logs in AWS EFS
- [Practice] Remote logging with AWS S3
- Limitations of remote logging in AWS S3
- Remote logging with AWS CloudWatch
- Sensitive data with Secret Backends
- [Practice] Managing connections with AWS Secret Manager
- [Creating] Storing the secret object of AWS Secret Manager for Flux
- Clean Up
Udemy Apache Airflow Tutorial
Section 9: Configuring the production environment
Udemy Apache Airflow 2.0
- Set up
- [Practice] Creating the production environment
- Identifying single point of failures
- [Practice] Making the Airflow UI highly available
- AWS Relational Database Service
- [Practice] Airflow with AWS RDS
- DAG Serialization
- [Practice] Making the web server stateless with DAG Serialization
- Clean Up