This is the central repository for branchwater.
branchwater is the command-line framework we use for searching large collections of sequencing data with genome-scale queries.
You can read more about branchwater in Sourmash Branchwater Enables Lightweight Petabyte-Scale Sequence Search, Irber et al., 2022, and you can read about one of the earliest use cases in Biogeographic Distribution of Five Antarctic Cyanobacteria Using Large-Scale k-mer Searching with sourmash branchwater, Lumian et al., 2022.
Branchwater was initially named MAGsearch.
Here are a few blog posts:
- MinHashing all the things: searching for MAGs in the SRA
- MinHashing all the things: a quick analysis of MAG search results
- Searching all public metagenomes with sourmash
- We also have a prototype real time search of the SRA that uses the same underlying code and methods.
branchwater is based on sourmash.
Branchwater is currently (Dec 2022) scattered across a bunch of repositories; we're working on consolidating them!
- The core Rust code for doing the search is in the sra_search repo
- Some utility code for actually running searches and maintaining catalogs is in ctb/magsearch.
- The underlying sourmash Rust library (used by branchwater) is in sourmash repo.
- The code for monitoring the SRA and building sourmash sketches from genomes and metagenomes is in wort.
- Titus has some initial attempts to use PyO3 to wrap Rust code in Python in ctb/2022-pymagsearch.
- add links to FracMinHash paper
- add FAQs like "how do I get the SRA sketches?" and point people at, umm, places. and mastiff.
Please file branchwater-specific issues and pull requests in the branchwater repo. We also hang out in the sourmash repo a lot, if you have more general questions about sourmash. And there's a gitter/matrix channel where you can contact a number of the
CTB Dec 2022