Step 1: Create first_dag.py file and place it in dags folder.
first_dag.py
# To crate DAG, we need to import the class DAG from the moduel airflow
from airflow import DAG
# Get an instance of DAG
# @daily: It is a predefined cron identifier in airflow, trigger this dag everyday once
with DAG(dag_id="first_dag", schedule_interval="@daily" ) as dag:
None
from airflow import DAG
To crate DAG, we need to import the class DAG from the moduel airflow
with DAG(dag_id="first_dag", schedule_interval="@daily" ) as dag:
Above statement get an instance of DAG.
@daily
It is a predefined cron identifier in airflow, trigger this dag everyday once
Step 2: Start airflow scheduler by executing below command.
airflow scheduler
$airflow scheduler
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2020-10-04 10:22:47,605] {__init__.py:50} INFO - Using executor SequentialExecutor
[2020-10-04 10:22:47,610] {scheduler_job.py:1367} INFO - Starting the scheduler
[2020-10-04 10:22:47,610] {scheduler_job.py:1375} INFO - Running execute loop for -1 seconds
[2020-10-04 10:22:47,610] {scheduler_job.py:1376} INFO - Processing each file at most -1 times
[2020-10-04 10:22:47,610] {scheduler_job.py:1379} INFO - Searching for files in /Users/krishna/airflow/dags
[2020-10-04 10:22:47,617] {scheduler_job.py:1381} INFO - There are 2 files in /Users/krishna/airflow/dags
[2020-10-04 10:22:47,617] {scheduler_job.py:1438} INFO - Resetting orphaned tasks for active dag runs
[2020-10-04 10:22:47,648] {dag_processing.py:562} INFO - Launched DagFileProcessorManager with pid: 6462
[2020-10-04 10:22:48,875] {settings.py:55} INFO - Configured default timezone <Timezone [UTC]>
[2020-10-04 10:22:48,884] {dag_processing.py:774} WARNING - Because we cannot use more than 1 thread (max_threads = 2) when using sqlite. So we set parallelism to 1.
Step 3: Start webserver by executing below command.
airflow webserver
$airflow webserver
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2020-10-04 10:23:34,785] {__init__.py:50} INFO - Using executor SequentialExecutor
[2020-10-04 10:23:34,786] {dagbag.py:417} INFO - Filling up the DagBag from /Users/krishna/airflow/dags
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
=================================================================
[2020-10-04 10:23:35 +0530] [6523] [INFO] Starting gunicorn 20.0.4
[2020-10-04 10:23:35 +0530] [6523] [INFO] Listening at: http://0.0.0.0:8080 (6523)
[2020-10-04 10:23:35 +0530] [6523] [INFO] Using worker: sync
[2020-10-04 10:23:35 +0530] [6526] [INFO] Booting worker with pid: 6526
[2020-10-04 10:23:35 +0530] [6527] [INFO] Booting worker with pid: 6527
[2020-10-04 10:23:36 +0530] [6528] [INFO] Booting worker with pid: 6528
[2020-10-04 10:23:36 +0530] [6529] [INFO] Booting worker with pid: 6529
Step 4: Open the url http://localhost:8080 in browser to see the dag first_dag.
Go to ‘Graph View’ by clicking on Graph view button.
You will be navigated to Graph View of the dag.
You can observe there is a warning message ‘No tasks found’.
Let’s define a task to the airflow first_dag in the next post.
No comments:
Post a Comment