Description:
This project involves fetching and analyzing recent NBA scores, player statistics, and news. Technologies used include AWS S3, EC2, Airflow, Snowflake, DBT, Streamlit, Python, and SQL.
📹 Video Demonstration of the Project:
ELT approach to the project.
ELT approach to the project.
📁 Project Goals:
- Collect NBA scores
- Analyze top 100 players in the 2024 season
- Compile a list of top 500 players of all time
- Fetch NBA-related news using NewsAPI
Technologies:
- Cloud Services:
- AWS S3: Used for scalable object storage, ideal for storing and retrieving large amounts of unstructured data.
- EC2: Provides resizable compute capacity in the cloud, allowing for the deployment and management of virtual servers for various applications.
- Workflow Management:
- Airflow: An open-source platform to programmatically author, schedule, and monitor workflows, ensuring efficient task automation and data pipeline management.
- Data Warehouse:
- Snowflake: A cloud-based data warehousing solution that provides a fully managed service for storing and analyzing large volumes of data with high performance and scalability.
- Data Transformation:
- DBT (Data Build Tool): A tool for transforming raw data into an organized, usable state through SQL-based transformations and data modeling within the data warehouse.
- Visualization:
- Streamlit: An open-source app framework for creating and sharing custom web applications for data visualization and machine learning projects with minimal code.
- Programming Languages:
- Python: A versatile programming language used for developing scripts, data analysis, machine learning models, and automating tasks.
- SQL: A domain-specific language used for managing and manipulating relational databases, essential for querying, updating, and managing data in a data warehouse.
🗂️ Project Structure
To go about for the data engineering project there are mainly 2 approaches. And I implemented both of the them.