1 min read

Twitter Data Pipeline

An Airflow ETL pipeline for Twitter data using Docker, Python, and AWS S3.

Twitter Data Pipeline

This data engineering project uses Airflow to orchestrate an ETL workflow for Twitter data, with tasks running inside Docker containers.

  • Extracted Twitter data with API calls through Tweepy.
  • Transformed JSON data into CSV using Python, pandas, and JSON tooling.
  • Loaded processed data into AWS S3 with boto3.
  • Used Airflow and Docker to create a controlled, isolated workflow for development, testing, and deployment.