Airflow on AWS
There are various deployment approaches available for Airflow. It includes deployment of all components on a single VM or deployment of different components on separate single or load-balanced VMs. Some of the components of the Airflow, such as the Task creation and monitoring UI, need a webserver and other components, such as the scheduler and executor, need a
native runtime of Python. The choice of deployment model is driven by the concerns such as performance, availability, and scalability. The focus is always on the scheduler and executor components because they carry out the main workload of
Apache Airflow and need clustering and autoscaling.
AWS provides a variety of options for deploying Airflow that can be categorized under IaaS, PaaS, as well as SaaS.
Kubernetes is the proven solution for auto-scaling, elasticity, and automatic resource management. There is a huge community supporting Kubernetes initiatives and hence several ready-to-use configuration files are available for deploying Airflow using EKS. EKS keeps spawning new nodes with the Airflow executor or scheduler for handling new and heavy workloads.
Deployment of Airflow on EC2 is almost the same as you would deploy on an on-premises VM: sweet and simple, old-style deployment, pre-configured capacity, fixed nodes in the cluster, and pre-determined load balancing. A set of dedicated EC2 instances takes care of the webserver components and another set of EC2 instances host the scheduler and executors.
Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service in its nascent state as of now in 2021. It is a SaaS offering that promises to address the most common concerns around scalability, availability, and security. Like any other service, it is easy to start, however, it is not industry-proven yet.
It has the power to become one of the most sought-after deployment models of Airflow because it is pre-integrated with other proven services of AWS, such as Amazon S3, CloudWatch, IAM, and others.
When you are moving your Airflow solution on the AWS cloud, we recommend evaluating the option to use EKS orchestration. It would require you to analyze and determine the sizes and frequency of workloads. Careful planning of instance sizes can help you optimize your costs and utilize the instances to their maximum capacity.
Thinkport is a dynamic and constantly growing cloud consulting company, with the goal of developing innovative technologies and solutions in the field of cloud computing. As a certified Microsoft Gold Platform Partner, we work closely with Microsoft, in the Azure cloud environment, and also have certified expertise with Amazon Web Services.
Our strengths and expertise lie in the areas of Multi-Cloud, Data Lakes, Big Data, AI and Event-Driven Architectures (Hadoop, Kafka, Solace) and Terraform. To get further insight about our services, feel free to visit our website and newly updated workshop page.