◐ Shell
clean mode source ↗

Minish

Hello, we're Minish!

About us

We're a two-person (pringled and stephantul) open-source lab, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our models and packages, you can:

  • Embed the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on a CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: tiny static embedding models with state-of-the-art performance.
  • potion: the best small models in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • semble: the fastest and best code search library for your agent.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, semantic deduplication and filtering for your text datasets.
  • model2vec-rs: a Rust port of model2vec.

You can also find us on:

Pinned Loading

  1. Fast State-of-the-Art Static Embeddings

    Python 2.1k 122

  2. Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read

    Python 5.2k 224

  3. Fast Multimodal Semantic Deduplication & Filtering

    Python 936 57

  4. Lightweight Nearest Neighbors with Flexible Backends

    Python 345 13

  5. Pre-train Static Word Embeddings

    Python 106 9

  6. Official Rust Implementation of Model2Vec

    Rust 193 23

Repositories

Showing 10 of 11 repositories

  • semble Public

    Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read

    MinishLab/semble’s past year of commit activity

  • MinishLab/watertemplate’s past year of commit activity

    Makefile

    5

    MIT

    3 0 0

    Updated Jun 14, 2026

  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity

    Python

    106

    MIT

    9 1 0

    Updated Jun 9, 2026

  • model2vec Public

    Fast State-of-the-Art Static Embeddings

    MinishLab/model2vec’s past year of commit activity

  • MinishLab/docs’s past year of commit activity

    MDX 0

    2 0 0

    Updated Jun 5, 2026

  • semhash Public

    Fast Multimodal Semantic Deduplication & Filtering

    MinishLab/semhash’s past year of commit activity

    Python

    936

    MIT

    57 1 0

    Updated May 24, 2026

  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity

    Python

    345

    MIT

    13 1 1

    Updated May 24, 2026

  • model2vec-rs Public

    Official Rust Implementation of Model2Vec

    MinishLab/model2vec-rs’s past year of commit activity

    Rust

    193

    MIT

    23 0 0

    Updated May 24, 2026

  • MinishLab/.github’s past year of commit activity

    0 0

    0 0

    Updated Apr 30, 2026

  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity

    Python

    12

    MIT 0

    0 0

    Updated Sep 20, 2025