◐ Shell
clean mode source ↗

GitHub - sprckt/pystack: Open source data platform built on Python libraries

PyDataStack

A modern, open-source data platform built entirely with Python tools, demonstrating a complete end-to-end data pipeline for Star Wars film data.

Technologies

  • Data Ingestion: dlt for extracting data from the Star Wars API
  • Data Warehouse: DuckDB for fast, embedded analytics
  • Data Transformation: dbt for SQL-based data modelling
  • Data Orchestration: Dagster for pipeline management
  • Data Visualization: Streamlit for interactive dashboards

Prerequisites

  • Python 3.8+
  • just command runner (optional)

Installation

  1. Clone the repository:
git clone https://github.com/your-username/pystack.git
cd pystack
  1. Create and activate a virtual environment:

Quick Start

Run the Streamlit Dashboard

just bi
# Or manually:
cd src && streamlit run visualisation/app.py

Run Dagster Pipeline

just orchestrate
# Or manually:
dagster dev -f src/orchestration/definitions.py

Query DuckDB

just duck
# Or manually:
duckdb src/pystack.duckdb

View dbt Documentation

Project Structure

pystack/
├── src/
│   ├── orchestration/      # Dagster pipeline definitions
│   ├── transformation/     # dbt models and configurations
│   └── visualisation/      # Streamlit dashboard
├── justfile                # Command shortcuts
└── README.md

Dashboard Features

  • Financials: View film budgets, box office revenue, and ROI
  • Attributes: Analyze species, characters, planets, and starships per film

License

This project is open source and available under the MIT License.

Acknowledgments

  • Star Wars API (SWAPI) for providing the data
  • PyConDE 2025 for inspiration

Built using Python for PyCon DE and PyData 2025