Calculating SAC on metagenome clusters #36

nmb85 · 2020-09-04T20:20:28Z

@luizirber, one more thing for today (not intending to distract you), it would be really interesting if you could calculate the species accumulation curve (SAC) for hash sets in clusters of metagenomes in your monster wort database. For example, when looking at soil metagenomes as a cluster, you could build a matrix of hashes (such as here), calculate different orders of intersection between hash sets from the soil metagenomes, and then plot an SAC from the hashes. While this might be impossible with kmers, and species tallies are corrupted by incomplete annotation due to incomplete databases, hashes might give you a chance to get an accurate SAC based on plotting the effect of incrementally adding hash sets and seeing the change in intersection sets. See equation 3 in this paper for a definitive explanation. Then you could efficiently use all the data in the SRA and JGI dbs to estimate if the species count based on current soil metagenome is "open" (SAC fits a power law function) or "closed" (SAC fits an exponential function), that is, whether or not we've collected enough data to estimate an asymptote for the number of species (in this case using hashes as a proxy) in soil metagenomes (or some other interesting biome). Although I'm not a soil biologist, I think that's a major question in their field. Other biomes might be interesting too. Not sure if anyone has tried this with raw kmers, but it would seem too gargantuan of a task. Hashes might make this problem tractable?

luizirber · 2020-09-05T16:23:53Z

That is a really good idea... and a monstrous matrix 🤣

I'll work on sharing all the sigs in a couple of weeks, but it is not something I can tackle at the moment 😢

ctb · 2020-09-05T16:26:56Z

yes! we explored this quite a bit a while back for tara, see https://github.com/ctb/2017-sourmash-rarefy/blob/master/tara-rarefy.ipynb for an example. Haven't looked at the code in a while tho ;).

nmb85 · 2020-11-19T04:33:03Z

Have you already seen this?
https://ieeexplore.ieee.org/abstract/document/9139876

ctb mentioned this issue Sep 9, 2020

Within signature distance? sourmash-bio/sourmash#33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating SAC on metagenome clusters #36

Calculating SAC on metagenome clusters #36

nmb85 commented Sep 4, 2020 •

edited

Loading

luizirber commented Sep 5, 2020

ctb commented Sep 5, 2020

nmb85 commented Nov 19, 2020

Calculating SAC on metagenome clusters #36

Calculating SAC on metagenome clusters #36

Comments

nmb85 commented Sep 4, 2020 • edited Loading

luizirber commented Sep 5, 2020

ctb commented Sep 5, 2020

nmb85 commented Nov 19, 2020

nmb85 commented Sep 4, 2020 •

edited

Loading