Udemy Apache Airflow

Posted onby

Those resources and services are not maintained, nor endorsed by the Apache Airflow Community and Apache Airflow project (maintained by the Committers and the Airflow PMC). Use them at your sole discretion. The community does not verify the licences nor validity of those tools, so it’s your responsibility to verify them.

  1. Udemy Apache Airflow Pro
  2. Udemy Apache Airflow Software
  3. Udemy Apache Airflow Tutorial
  4. Udemy Apache Airflow 2.0

If you would you like to be included on this page, please reach out to the Apache Airflow dev or user mailing list and let us know or simply open a Pull Request to that page.

Learning resources

Apache Airflow YouTube Channel - Official YouTube Channel

Apache Airflow is a platform created by community to programmatically author, schedule and monitor workflows. It is scalable, dynamic, extensible and modulable. Without any doubts, mastering Airflow is becoming a must-have and an attractive skill for anyone working with data. Master Air movement like an expert! Begin with the essentials as well as copulate to producing your very own Process! Enrol This Course 'The Complete Guide to Apache Airflow ' Totally Free F. Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. If you have many ETL (s) to manage, Airflow is a must-have.

Airflow Summit - Online conference for Apache Airflow developers

Awesome Apache Airflow - Curated list of resources about Apache Airflow

The Complete Hands-On Introduction to Apache Airflow by Marc Lamberti on Udemy

Apache Airflow: Complete Hands-On Beginner to Advanced Class by Alexandra Abbas on Udemy

Airflow as a Service

Astronomer - Managed Apache Airflow in Astronomer Cloud, or self-hosted within your environment

Google Cloud Composer - Managed Apache Airflow service on Google Cloud Platform

Udemy

Qubole - Managed Apache Airflow Service on all major public clouds

Amazon Managed Workflows for Apache Airflow - Managed Apache Airflow on Amazon Web Services (AWS)

Third Party Airflow Plugins and Providers

Astronomer Registry - The discovery and distribution hub for Apache Airflow integrations created to aggregate and curate the best bits of the ecosystem.

Airflow Plugins - Central collection of repositories of various plugins for Airflow, including mailchimp, trello, sftp, GitHub, etc.

Airflow ECR Plugin - Plugin to refresh AWS ECR login token at regular intervals. This is helpful where DockerOperator needs to pull images hosted on ECR.

Udemy Apache Airflow

Tools integrating with Airflow

afctl - A CLI tool that includes everything required to create, manage and deploy airflow projects faster and smoother.

airflow-aws-executors - Run Airflow Tasks directly on AWS Batch, AWS Fargate, or AWS ECS; provisioning less infra is more.

airflow-code-editor - A tool for Apache Airflow that allows you to edit DAGs in browser.

airflow-diagrams - Auto-generated Diagrams from Airflow DAGs

airflow-maintenance-dags - Clairvoyant has a repo of Airflow DAGs that operator on Airflow itself, clearing out various bits of the backing metadata store.

AirflowK8sDebugger - A library for generate k8s pod yaml templates from an Airflow dag using the KubernetesPodOperator.

Airflow Ditto - An extensible framework to do transformations to an Airflow DAG and convert it into another DAG which is flow-isomorphic with the original DAG, to be able to run it on different environments (e.g. on different clouds, or even different container frameworks - Apache Spark on YARN vs Kubernetes). Comes with out-of-the-box support for EMR-to-HDInsight-DAG transforms.

Apache-Liminal-Incubating - Liminal provides a domain-specific-language (DSL) to build ML/AI workflows on top of Apache Airflow. Its goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production.

Chartis - Python package to convert Common Workflow Language (CWL) into Airflow DAG.

Apache

CWL-Airflow - Python package to extend Apache-Airflow 1.10.11 functionality with CWL v1.2 support.

dag-factory - A library for dynamically generating Apache Airflow DAGs from YAML configuration files.

Dag Dependencies viewer - A tool which creates a view to visualize dependencies between the Airflow DAGs

Databand - Observability platform built on top of Airflow.

Udemy Apache Airflow Pro

dbt (data build tool) - Data transformation tool, dbt jobs can be scheduled using Airflow.

GeniumCloud - One-Stop-Shop Platform for rapid build, scheduling and control Airflow workflows via completely new UI. Out of the box comprehensive Airflow infrastructure monitoring, integration with alerting systems and service adoption from small to enterprise organizations. The easiest way to manage complex workflows.

gusty - Create a DAG using any number of YAML, Python, Jupyter Notebook, or R Markdown files that represent individual tasks in the DAG. gusty also configures dependencies, DAGs, and TaskGroups, features support for your local operators, and more. A fully containerized demo is available here.

Meltano - Open source, self-hosted, CLI-first, debuggable, and extensible ELT tool that embraces Singer for extraction and loading, leverages dbt for transformation, and integrates with Airflow for orchestration.

Oozie to Airflow - A tool to easily convert between Apache Oozie workflows and Apache Airflow workflows.

Pylint-Airflow - A Pylint plugin for static code analysis on Airflow code.

simple-dag-editor - Zero configuration Airflow tool that let you manage your DAG files.

Viewflow - An Airflow-based framework that allows data scientists to create data models without writing Airflow code.

whirl - Fast iterative local development and testing of Apache Airflow workflows.

Apache Airflow is an open-source platform to programmatically author, schedule and monitor workflows. If you have many ETL(s) to manage, Airflow is a must-have.

In the Apache Airflow on AWS EKS: The Hands-On Guide course, you are going to learn everything you need to set up a production ready architecture on AWS EKS with Airflow and the Kubernetes Executor. Discover how to execute tasks at scale like you will do in your company.

You will find the materials directly in a video of the course

Section 1: Introduction

  1. Important Prerequisites
  2. Who I am
  3. Your Airflow Journey
  4. Overview of the architecture
  5. The Checklist
Apache

Section 2: Configuring AWS

  1. Defining a budget
  2. [Practice] Creating the IAM admin group
  3. [Practice] Create the IAM admin user

Udemy Apache Airflow Software

Section 3: Exploring the DevOps world

  1. Why is knowing DevOps concepts important?
  2. Reminder about Kubernetes
  3. Kubernetes Quiz
  4. What is IaC or Infrastructure as code?
  5. IaC Quiz
  6. Deployments with GitOps
  7. GitOps made simple with Flux
  8. GitOps Quiz

Section 4: Creating the EKS cluster with GitOps

Udemy apache airflow pro
  1. [Practice] Creating the cloud9 environment for the workstation
  2. [Practice] Configuring the workstation
  3. [Practice] Configuring Cloud9 with the Admin account
  4. [Practice] Creating the IAM role to interact with the EKS cluster
  5. AZs, VPCs and Subnets in AWS
  6. What is AWS EKS?
  7. [Practice] Creating and configuring the Git repository for GitOps
  8. [Practice] Creating a multi-node EKS cluster with EKSCTL and GitOps
  9. [Practice] Configuring the EKS cluster with Flux
  10. Namespaces in Kubernetes
  11. [Practice] Creating dev, staging and prod namespaces
  12. Clean Up

Section 5: Deploying Airflow with DAGs

  1. Set Up
  2. Deployments with Helm
  3. [Practice] Overview of the Airflow Helm chart
  4. Scaling with the Kubernetes Executor
  5. [Practice] Creating your first release of Airflow
  6. [Practice] Deploying Airflow with Flux
  7. Troubleshooting deployments with Flux
  8. Synchronizing DAGs in Kubernetes
  9. [Practice] Fetching DAGs with Git-Sync
  10. [Practice] Running DAGs with Git-Sync
  11. Secrets in Kubernetes
  12. [Practice] Fetching DAGs with Git-Sync from a private repository
  13. [Practice] Adding the secret in the repo
  14. Volumes in Kubernetes
  15. Introduction to AWS EFS
  16. [Practice] Configuring AWS EFS
  17. [Practice] Sharing DAGs between pods with AWS EFS
  18. Clean Up

Section 6: Building CI/CD pipelines to deploy Airflow

  1. Set Up
  2. What is AWS CodePipeline?
  3. [Practice] Building a CI/CD pipeline with CodePipeline and ECR
  4. [Practice] Deploying Airflow in EKS with CodePipeline and Flux
  5. Unit testing in Airflow
  6. [Practice] Unit testing your DAGs
  7. [Practice] Building the CI/CD pipeline in dev with unit tests
  8. [Practice] Integration tests for testing tasks in DAGs
  9. [Practice] Building the CI/CD pipeline in staging with integration tests
  10. [Practice] Clean up
  1. [Practice] Set up
  2. Services in Kubernetes
  3. Architecture with the Elastic Load Balancer
  4. [Practice] Exposing the Airflow UI with AWS Elastic Load Balancer
  5. What is an Ingress?
  6. Architecture with the AWS ALB Ingress controller
  7. [Practice] Exposing the Airflow UI with AWS ALB Ingress
  8. [Practice] Exposing the staging environment with AWS ALB
  9. Quick reminder about SSL
  10. [Practice] Creating a Domain for Airflow with ExternalDNS and AWS Route53
  11. [Practice] Activating SSL on the Airflow UI
  12. [Practice] Fix the AWS ALB’s health checks
  13. [Practice] Exporting the SSL secret object
  14. [Practice] Upgrading the staging environment
  15. [Exercise] Enabling DNS and SSL for staging
  16. [Practice] Creating subdomains to access the UIs of Airflow
  17. Clean Up

Section 8: Logging with Airflow in AWS EKS

  1. Set Up
  2. RBAC in Kubernetes
  3. Permission issues for accessing pod’s logs
  4. [Practice] Storing logs in AWS EFS
  5. [Practice] Remote logging with AWS S3
  6. Limitations of remote logging in AWS S3
  7. Remote logging with AWS CloudWatch
  8. Sensitive data with Secret Backends
  9. [Practice] Managing connections with AWS Secret Manager
  10. [Creating] Storing the secret object of AWS Secret Manager for Flux
  11. Clean Up

Udemy Apache Airflow Tutorial

Section 9: Configuring the production environment

Udemy Apache Airflow 2.0

  1. Set up
  2. [Practice] Creating the production environment
  3. Identifying single point of failures
  4. [Practice] Making the Airflow UI highly available
  5. AWS Relational Database Service
  6. [Practice] Airflow with AWS RDS
  7. DAG Serialization
  8. [Practice] Making the web server stateless with DAG Serialization
  9. Clean Up
  10. Congratulations!