Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: reading variants as sparse in pgenlibr #291

Open
joellembatchou opened this issue Jan 24, 2025 · 5 comments
Open

Feature Request: reading variants as sparse in pgenlibr #291

joellembatchou opened this issue Jan 24, 2025 · 5 comments

Comments

@joellembatchou
Copy link

Hi,

We rely on the C++ pgenlibr to read PGEN format genotype data in REGENIE. The current functions available (RPgenReader::ReadIntHardcalls/RPgenReader::Read) reads in the genotype data for all samples. In the case of rare variants, it seems PGEN stores the data sparsely based on the format documentation (i.e. only indices and genotypes of carriers are stored). Could this functionality be provided in the C++ pgenlibr, i.e. a flag identifying whether a variant is stored sparsely (given its index) as well as a function that returns the indices & genotypes (or dosages) of the carriers only?

Thanks,
Joelle

@chrchang
Copy link
Owner

Okay, I plan to provide this functionality on GitHub later this week, though the next CRAN release will wait till mid-year.

@chrchang
Copy link
Owner

chrchang commented Feb 4, 2025

HasSparseHardcalls(pgen, variant_num) now returns whether variant_num has a sparse representation. If it does, ReadSparseHardcalls(pgen, variant_num) returns an object where "sample_nums" has the sample indexes, and "counts" has the counts.

@joellembatchou
Copy link
Author

Thanks that sounds great, looking forward to testing it!

@joellembatchou
Copy link
Author

joellembatchou commented Feb 5, 2025

Small follow-up: is ReadMaybeSparseHardcalls() meant to be used if dosages are present? And if so, is there a function similar to HasSparseHardcalls() for dosages?

@chrchang
Copy link
Owner

chrchang commented Feb 6, 2025

ReadMaybeSparseHardcalls() ignores dosages if they're present. I will think about an appropriate way to provide this functionality for dosages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants