Gathr supports Airflow Versions 1.10.5 (Airflow1) and 2.1.2 (Airflow2) respectively.
This topic captures installation steps for fresh installation of Airflow 1 and Airflow2, and also steps for upgrade from Airflow1 to Airflow2.
Note: In Gathr while utilizing Airflow services, users have choice to select the required version i.e., Airflow1 or Airflow2 which can be connected just by providing the correct Airflow server URL in Airflow configuration. So, if you have Airflow1 and Airflow2 installed on different nodes, any of those can be easily pointed to from Gathr application.
Given below are the steps to do a fresh installation of Airflow1 (Version: 1.10.5).
Note: Gathr supports Apache Airflow with default Python, i.e., Python 2.7.
1. Create a folder, that will be used as Airflow home (with sax user)
3. Login with root user, open .bashrc file and add the following property in it:
4. Login with Gathr user and open .bashrc file and add following in it
5. Install Airflow using the following command (with root user)
6. Initialize Airflow database (with Gathr user)
Note: Step 7 and Step 8 will be performed after Sub-Package Installation, Configuration and Plugin Installation is successfully completed.
7. Start Airflow with Gathr user Configuration.
To install sub packages (with root user).
For more details, please refer link:
https://airflow.apache.org/installation.html
Go to $AIRFLOW_HOME and open airflow.cfg file, and change the following properties:
base_url = http://ipaddress:port
web_server_port = port (i.e. 9292)
Add SMTP details for email under section:
Uncomment and provide values for the following:
If environment is Kerberos Security enabled, then add the following configurations:
kinit_path = path to kinit command (i.e. kinit)
keytab = keytab file (i.e. /etc/security/keytabs/service.keytab)
By default, Airflow uses SQL Light as database. It also allows user to change database.
Following are steps to configure Postgres as database:
2. Set password for Airflow user
4. Open airflow.cnf file and provide postgres details (i.e username, password, ipaddress:port and databasename)
5. Now run command to set up database
Steps to add Gathr Airflow Plugin in Airflow:
1. Create plugins folder in Airflow home (*if it does not exits) i.e. $AIRFLOW_HOME/plugins.
2. Untar <sax_home>/ conf/common/airflow-plugin/sax_airflow_rest_api_plugin.tar.gz
3. Copy sax_airflow_rest_api_plugin/* to airflow plugin folder.
Token-based authentication is supported.
Provide token in the request header. Same token key and value will be provided in the Airflow config file.
Add the following entry in $AIRFLOW_HOME/airflow.cnf file
<sax_request_http_header_token>: Replace with key used in request header for token.
<token>: Replace with token value
To configure Airflow in Gathr, see the Workflows topic in the Gathr User’s Guide.
If Airflow is running on HTTP and Gathr is running on HTTPS then the user needs to do the following configuration:
# cert file if sax is running with https (else do not provide this key)
sax_cert_file = certificate path
Create certificate that is required to connect with SAX. Place this certificate file on machine where airflow is running and provide its path.
1. If you are using Kafka Operator in the Workflow, make sure the SSL certificates are enabled on Kafka. To know more about this, see the Create a Workflow > Nodes >Actions > Kafka Alert Operator section in the Workflows topic of the Gathr User’s Guide.
2. The HDFS sensor in Airflow will only work when Airflow is installed on one of the nodes of the cluster. It will not work in case it is pointed from the node which is not a part of the cluster. (This is specifically to HDFS sensor for Kerberos setup only).
To know more about this, see the Create a Workflow > Nodes > Actions > HDFS Sensor section in the Workflows topic of the Gathr User’s Guide.
While starting Airflow Webserver if the following error occurs:
“Error: No module named 'airflow.www'” while starting airflow websever
If Python2 and Python3 both are installed on machine where Airflow is deployed and default, gunicorn library (used by Airflow) is changed to Python 3 instead of Python 2
The output could be as follows:
Open File /usr/lib/python2.7/site-packages/airflow/bin/cli.py
root> /usr/lib/python2.7/site-packages/airflow/bin/cli.py
In the method def webserver(args), search for:
'-b', args.hostname + ':' + str(args.port),
'-c', 'python:airflow.www.gunicorn_config',]
run_args = ['/usr/bin/gunicorn',
'-b', args.hostname + ':' + str(args.port),
'-c', 'python:airflow.www.gunicorn_config',]
Given below are the steps to do a fresh installation of Airflow2 (Version: 2.1.2) and also for upgrade from Airflow1 to Airflow2.
l Python and Python2 must point to Python 2.7.
l Python 3.7.9 must be installed. Python3 must point to Python 3.7.x.
l pip and pip2 must point to pip2.7.
l pip3 must point to pip3.7.9.
l Make sure that the version of SQLite database is greater than 3.15.0 (For Airflow 2 only).
Note: This section is only applicable for upgrade from Airflow1 to Airflow2.
If you have installation of Airflow 1.10.5 with Python 2.7 then first follow these steps for uninstalling Airflow 1.10.5. If not, then skip these steps:
1. Unschedule all workflows on Gathr.
2. Run below command and copy the airflow installation location (i.e., /usr/lib/python2.7/site-packages)
3. Uninstall Airflow 1.10.5 using below command:
4. Go to Airflow installation location (i.e., /usr/lib/python2.7/site-packages) and remove all the folders related to Airflow.
It will show the airflow executable (i.e., /usr/bin/airflow). Delete this file.
6. Go to AIRFLOW_HOME, take backup of airflow.cfg file using below command:
7. Go to AIRFLOW_HOME and remove contents from dags folder and plugin folder using below command:
Airflow2 Installation/Upgrade Steps
Note: Skip the steps 1-4 if you are upgrading from Airflow1 to Airflow2.
1. Create a folder, that will be used as Airflow home using below command:
2. Create a folder dags using below command:
3. Login with root user, open .bashrc file and append below statement in the same.
4. Login with sax user, open .bashrc file and add airflow home as env:
5. Install Airflow using following command:
6. Initialize the Airflow database using below command:
To configure a different database, please see Database Configuration.
To know more about how to get started with Apache Airflow, see below reference:
https://airflow.apache.org/docs/apache-airflow/stable/start/index.html
Airflow Providers Installation
The next step is to install the Airflow providers.
Use below commands to install the Airflow providers:
To know more about Apache Airflow installation, see below reference:
https://airflow.apache.org/installation.html
Use below commands to install the Kerberos related system packages.
Config File Updates <Configuration>
Go to $AIRFLOW_HOME and open airflow.cfg file. Change the following properties in the file:
1. Copy the value of property sql_alchemy_conn from airflow.cfg.bck file.
2. Provide the copied value in airflow.cfg file for property sql_alchemy_conn.
Airflow uses SQLite as the default database. It also allows user to change to a preferred database.
Steps to configure Postgres as the preferred database are given below:
1. Create ‘airflow’ user using below command:
Enter name of role to add: airflow
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
2. Set password for 'airflow' user using below command:
3. Create Airflow database using below command:
4. Grant permission to Airflow database using below command:
5. Open airflow.cnf file and provide Postgres details (i.e username, password, ipaddress:port and databasename)
6. Generate the new fernet key for fresh installation and update this value in Airflow.
l Open the python3 terminal and import the fernet module by executing the below command:
l Generate the fernet key using the below command:
l Get the newly generated fernet key on console using below command:
Note: Store the generated fernet key securly.
l Update this fernet key in airflow.cfg file which is present in the following path:
The commands to update Fernet key in the config file are:
To know more about usage of fernet in Airflow, see below reference:
https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/fernet.html
7. Now, run below command to setup the database:
If SQLite version is lesser than 3.15.0, then below commands can be used for the database upgrade. (For Airflow 2 only)
Run below command to create an admin user in airflow:
You can use same command to generate multiple users for Airflow with different roles.
Gathr supports default authentication method which is Airflow DB authentication.
Steps to add Gathr Airflow Plugin in Airflow:
1. Create plugins folder in Airflow home (if it does not exist) i.e. $AIRFLOW_HOME/plugins.
2. Go to the folder <sax_home>/ conf/common/airflow-plugin/airflow2/ and copy the content from this folder to Airflow plugins folder.
Start airflow using below command: