Airflow on AWS

3 Ways to deploy Airflow on AWS


There are various deployment approaches available for Airflow. It includes deployment of all components on a single VM or deployment of different components on separate single or load-balanced VMs. Some of the components of the Airflow, such as the Task creation and monitoring UI, need a webserver and other components, such as the scheduler and executor, need a
native runtime of Python. The choice of deployment model is driven by the concerns such as performance, availability, and scalability. The focus is always on the scheduler and executor components because they carry out the main workload of
Apache Airflow and need clustering and autoscaling.


What options are available to deploy Airflow on AWS? 

AWS provides a variety of options for deploying Airflow that can be categorized under IaaS, PaaS, as well as SaaS.


Deploy Airflow on AWS EKS

Kubernetes is the proven solution for auto-scaling, elasticity, and automatic resource management. There is a huge community supporting Kubernetes initiatives and hence several ready-to-use configuration files are available for deploying Airflow using EKS. EKS keeps spawning new nodes with the Airflow executor or scheduler for handling new and heavy workloads.

 The biggest drawback in the setup is that the cost for the smaller workloads may turn out to be higher. Depending on the variety of tasks and their resource requirements, EKS may need to launch a greater number of instances without maximum utilization of each node, leading to increased costs.


Deploy Airflow on AWS EC2

Deployment of Airflow on EC2 is almost the same as you would deploy on an on-premises VM: sweet and simple, old-style deployment, pre-configured capacity, fixed nodes in the cluster, and pre-determined load balancing. A set of dedicated EC2 instances takes care of the webserver components and another set of EC2 instances host the scheduler and executors.

 This deployment model offers little dynamism in terms of scale-up or scale-down and the availability of the solution depends on the number of upfront provisioned instances. It works well in a scenario where workloads are fairly fixed and growth is minimal or constant. It cannot handle load spikes at all.

 

Use the Managed Airflow service on AWS

Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service in its nascent state as of now in 2021. It is a SaaS offering that promises to address the most common concerns around scalability, availability, and security. Like any other service, it is easy to start, however, it is not industry-proven yet.

It has the power to become one of the most sought-after deployment models of Airflow because it is pre-integrated with other proven services of AWS, such as Amazon S3, CloudWatch, IAM, and others.


Our recommendation

When you are moving your Airflow solution on the AWS cloud, we recommend evaluating the option to use EKS orchestration. It would require you to analyze and determine the sizes and frequency of workloads. Careful planning of instance sizes can help you optimize your costs and utilize the instances to their maximum capacity.

Thinkport is a dynamic and constantly growing cloud consulting company, with the goal of developing innovative technologies and solutions in the field of cloud computing. As a certified Microsoft Silver Platform Partner, we work closely with Microsoft, in the Azure cloud environment, and also have certified expertise with Amazon Web Services and the Google Cloud Platform.

Our strengths and expertise lie in the areas of Multi-Cloud, Data Lakes, Big Data, AI and Event-Driven Architectures (Hadoop, Kafka, Solace) and Terraform. To get further insight about our services, feel free to visit our website and newly updated workshop page.

Blog Kurator

Bledion Vladi

Business Development

Email:

bvladi@thinkport.digital