Build software better, together
Here are 2,556 public repositories matching this topic...
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Jun 16, 2026 - Python
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
-
Updated
Jan 22, 2026
Incremental engine for long horizon agents ๐ Star if you like it!
-
Updated
Jun 16, 2026 - Python
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
-
Updated
Jun 16, 2026 - Go
Unified querying, transformation, and modification of JSON, TOML, YAML, XML, INI, HCL, KDL and CSV.
-
Updated
May 22, 2026 - Go
Data processing for and with foundation models! ๐ ๐ ๐ฝ โก๏ธ โก๏ธ๐ธ ๐น ๐ท
-
Updated
Jun 16, 2026 - Python
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
-
Updated
Jun 16, 2026 - C++
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
-
Updated
Jun 10, 2026 - Python
A lightweight data processing framework built on DuckDB and 3FS.
-
Updated
Mar 5, 2025 - Python
A light-weight, flexible, and expressive statistical data testing library
-
Updated
Jun 16, 2026 - Python
High-performance AI pipeline engine with a C++ core and 50+ Python-extensible nodes. Build, debug, and scale LLM workflows with 13+ model providers, 8+ vector databases, and agent orchestration, all from your IDE. Includes VS Code extension, TypeScript/Python SDKs, and Docker deployment.
-
Updated
Jun 16, 2026 - Python
The Context Layer for unstructured data: typed, versioned datasets over S3, GCS, Azure
-
Updated
Jun 16, 2026 - Python
Concurrent and multi-stage data ingestion and data processing with Elixir
-
Updated
Jun 13, 2026 - Elixir
Kubernetes-native platform to run massively parallel data/streaming jobs
-
Updated
Jun 16, 2026 - Rust
Large-scale pretraining for dialogue
-
Updated
Oct 17, 2022 - Python
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
-
Updated
Aug 26, 2021 - Python
Python Stream Processing
-
Updated
Jun 11, 2026 - Python
Scalable data pre processing and curation toolkit for LLMs
-
Updated
Jun 16, 2026 - Python
Extract Transform Load for Python 3.5+
-
Updated
May 12, 2023 - Python
Concurrent Python made simple
-
Updated
Feb 4, 2025 - Python
Improve this page
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."