Logo AppDev24 Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.
Applications

Airflow Installation on AWS EC2

Updated on Jun 22, 2024

Are you looking for a hassle-free way to set up Apache Airflow on your Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance? Look no further! In this article, we'll walk you through the process of installing Airflow using a simple script that can be added as part of the user data while launching an EC2 instance.

Why Install Airflow on AWS EC2?

Apache Airflow is a popular open-source platform for programmatically defining and monitoring workflows. With its robust features, such as support for distributed computing, scheduling, and task tracking, Airflow has become a go-to choice for many organizations. By installing Airflow on your AWS EC2 instance, you can take advantage of the scalability, reliability, and cost-effectiveness that AWS provides.

The Installation Script

To get started, add the following script as part of the user data while launching an EC2 instance:

  • Optionally Installs PostgreSQL as the Airflow Metadata backend
  • Configured for LocalExecutor 
#!/bin/bash
# Name: airflow_server.sh
# Owner: Saurav Mitra
# Description: Configure Airflow Server
# Amazon Linux 2 Kernel 5.10 AMI 2.0.20221210.1 x86_64 HVM gp2


POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=airflow_db
POSTGRES_USER=airflow_user
POSTGRES_PASSWORD=airflow_pass


# Optional PostgreSQL in the same machine. You may use RDS/managed database #
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# Install PostgreSQL (Optional)
sudo amazon-linux-extras enable postgresql14 > /dev/null
sudo yum -y install postgresql postgresql-server postgresql-contrib postgresql-devel > /dev/null
sudo pip3 install psycopg2-binary

# Configure Database
sudo postgresql-setup initdb
sudo systemctl enable postgresql
sudo systemctl start postgresql
sudo -u postgres psql -c "CREATE DATABASE ${POSTGRES_DB};"
sudo -u postgres psql -c "CREATE USER ${POSTGRES_USER} WITH PASSWORD '${POSTGRES_PASSWORD}';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE ${POSTGRES_DB} TO ${POSTGRES_USER};"

sudo sed -i 's|host    all             all             127.0.0.1/32            ident|host    all             all             127.0.0.1/32            md5|g' /var/lib/pgsql/data/pg_hba.conf
sudo systemctl restart postgresql

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #


# Airflow Setup
sudo mkdir /opt/airflow
sudo mkdir /opt/airflow/dags
sudo mkdir /opt/airflow/logs
sudo mkdir /opt/airflow/plugins

# Project Repository
sudo yum -y install git
# git clone https://github.com/username/demo_airflow.git


# Install Airflow
cd /opt/airflow
export AIRFLOW_HOME=/opt/airflow
AIRFLOW_VERSION=2.5.0
PYTHON_VERSION="$(python3 --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
sudo pip3 install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
sudo pip3 install apache-airflow[amazon,databricks,dbt-cloud,postgres,sftp,snowflake,ssh]
sudo pip3 install -r requirements.txt

airflow config list 2> /dev/null
sed -i "s|sql_alchemy_conn = sqlite:////opt/airflow/airflow.db|sql_alchemy_conn = postgresql+psycopg2://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}|g" /opt/airflow/airflow.cfg
sed -i 's|executor = SequentialExecutor|executor = LocalExecutor|g' /opt/airflow/airflow.cfg
sed -i 's|load_examples = True|load_examples = False|g' /opt/airflow/airflow.cfg
sed -i 's|parallelism = 32|parallelism = 4|g' /opt/airflow/airflow.cfg

airflow db init
airflow users create --username admin --firstname Airflow --lastname Admin --role Admin --email admin@example.org --password password

chown -R ec2-user:ec2-user /opt/airflow

sudo tee -a /etc/environment <<EOF
AIRFLOW_HOME='/opt/airflow'
EOF


sudo tee -a /etc/systemd/system/airflow-webserver.service <<EOF
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/etc/environment
User=ec2-user
Group=ec2-user
Type=simple
ExecStart= /usr/local/bin/airflow webserver
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
EOF


sudo tee -a /etc/systemd/system/airflow-scheduler.service <<EOF
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/etc/environment
User=ec2-user
Group=ec2-user
Type=simple
ExecStart=/usr/local/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
EOF

sudo chmod 0664 /etc/systemd/system/airflow-webserver.service
sudo chmod 0664 /etc/systemd/system/airflow-scheduler.service
sudo systemctl enable airflow-webserver.service
sudo systemctl enable airflow-scheduler.service
sudo systemctl start airflow-webserver
sudo systemctl start airflow-scheduler

The script is divided into several sections:

  • Optional PostgreSQL Installation: This section installs PostgreSQL (optional) and configures it for use with Airflow.
  • Airflow Setup: This section creates the necessary directories and files for Airflow, including the project repository.
  • Install Airflow: This section installs Airflow using pip, along with its dependencies.
  • Configure Airflow: This section sets up Airflow's configuration file (airflow.cfg) to use PostgreSQL as the metadata backend and configure other settings.
  • Initialize Database: This section initializes the Airflow database using the airflow db init command.
  • Create User: This section creates an admin user for Airflow using the airflow users create command.
  • Set Environment Variables: This section sets environment variables for Airflow, including the AIRFLOW_HOME variable.
  • Configure Systemd Services: This section configures systemd services for Airflow's web server and scheduler.

Test Airflow running on EC2 instance at default port 8080.

In this article, we've shown you how to install Apache Airflow on your AWS EC2 instance using a simple script. With this guide, you can quickly set up Airflow and start building workflows that take advantage of the scalability and reliability of AWS.

PrimeChess

PrimeChess.org

PrimeChess.org makes elite chess training accessible and affordable for everyone. For the past 6 years, we have offered free chess camps for kids in Singapore and India, and during that time, we also observed many average-rated coaches charging far too much for their services.

To change that, we assembled a team of top-rated coaches including International Masters (IM) or coaches with multiple IM or GM norms, to provide online classes starting from $50 per month (8 classes each month + 4 tournaments)

This affordability is only possible if we get more students. This is why it will be very helpful if you could please pass-on this message to others.

Exclucively For Indian Residents: 
Basic - ₹1500
Intermediate- ₹2000
Advanced - ₹2500

Top 10 Articles