Skip to content
@MinishLab

The Minish Lab

Solving big problems with small models

Hello, we're minish!

We're a two-person (@pringled and @stephantul) open-source company, with a focus on Natural Language Processing.

We believe that if you make models fast enough, you unlock new possibilities.

Using our software, you can:

  • Ingest the entire English Wikipedia in 5 minutes
  • Classify tens of thousands of documents per second on CPU
  • Approximately deduplicate extremely large datasets in minutes
  • Build the fastest RAG application in the world
  • Easily evaluate which ANN algorithm works best for your data

Our projects:

  • model2vec: make tiny models that are still really really good.
  • potion: the best small model in the world. 100-500x faster than a sentence-transformer, and almost as good.
  • vicinity: consistent interfaces to many approximate nearest neighbor algorithms.
  • semhash: lightning-fast, super accuracte, approximate deduplication for your text datasets.

You can also find us on: 🤗 huggingface 👽 LinkedIn

Pinned Loading

  1. model2vec model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    Python 603 23

  2. vicinity vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    Python 197 5

  3. semhash semhash Public

    Fast Semantic Text Deduplication

    Python 410 18

  4. tokenlearn tokenlearn Public

    Pre-train Static Word Embeddings

    Python 35 2

Repositories

Showing 9 of 9 repositories
  • model2vec Public

    The Fastest State-of-the-Art Static Embeddings in the World

    MinishLab/model2vec’s past year of commit activity
    Python 603 MIT 23 1 1 Updated Jan 21, 2025
  • semhash Public

    Fast Semantic Text Deduplication

    MinishLab/semhash’s past year of commit activity
    Python 410 MIT 18 1 2 Updated Jan 18, 2025
  • MinishLab/minishlab.github.io’s past year of commit activity
    SCSS 0 MIT 0 0 0 Updated Jan 17, 2025
  • tokenlearn Public

    Pre-train Static Word Embeddings

    MinishLab/tokenlearn’s past year of commit activity
    Python 35 MIT 2 1 1 Updated Jan 15, 2025
  • vicinity Public

    Lightweight Nearest Neighbors with Flexible Backends

    MinishLab/vicinity’s past year of commit activity
    Python 197 MIT 5 0 2 Updated Jan 8, 2025
  • .github Public

    Readme

    MinishLab/.github’s past year of commit activity
    0 0 0 0 Updated Jan 5, 2025
  • korok Public

    Lightweight Hybrid Search and Reranking

    MinishLab/korok’s past year of commit activity
    Python 7 MIT 1 0 0 Updated Dec 26, 2024
  • watertemplate Public template

    Template

    MinishLab/watertemplate’s past year of commit activity
    Makefile 1 MIT 1 0 0 Updated Dec 9, 2024
  • evaluation Public

    Code to evaluate performance for embeddings

    MinishLab/evaluation’s past year of commit activity
    Python 9 MIT 0 0 0 Updated Sep 25, 2024