Untitled

Github Repository: https://github.com/BadreeshShetty/Data-Engineering-ETL-Airflow-DBT-Parking

This data engineering project involves processing and analyzing NYC parking violations data, which consists of approximately 50 million records. Here’s a brief overview of the workflow:

  1. Data Source:
  2. Data Ingestion and Storage:
  3. Data Warehousing:
  4. Data Transformation:
  5. Data Visualization and Reporting:
  6. Programming Languages and Tools:

After connecting Docker→Airflow:

Checklist

<aside> 💡 (Only Once when docker started)

Add Admin Variables.

Add Connections:

After Every successful of Extract & Load (Run) Next Year info.

Add Variables:

Keys:parking_violations_file_name: values from notebook for the year Keys:parking_violations_url: values from notebook for the year

</aside>

Only First Time: Snowflake→AWS Integration (AWS_External_Id) is required to be added in the trust policy.

After all the successful runs of the EL Pipeline for the years run the DBT DAG. (Compile, Run, Test, Docs generate)

Some Screenshots of the Project

Airflow Dags (check dags)