-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an encrypted DNA ancestry using Concrete ML #95
Labels
Comments
zaccherinij
added
🎯 Bounty
This bounty is currently open
📁 Concrete ML
library targeted: Concrete ML
labels
Feb 9, 2024
zaccherinij
changed the title
Create an encrypted DNA classifier using Concrete ML
Create an encrypted DNA ancestry using Concrete ML
Feb 14, 2024
A friendly reminder that the Submission deadline is May 12th, 2024 at 23:59 AoE (Anywhere on Earth). Good luck! |
I'm not sure how to submit my solution (the link above leads to a general page) but here it is: |
Hi @alephzerox,
Please head to the page: https://www.zama.ai/bounty-and-grant-program and use the form under "submit to the bounty program"
Cheers
…On Sun, May 12, 2024 at 6:52 PM alephzerox ***@***.***> wrote:
I'm not sure how to submit my solution (the link above leads to a general
page) but here it is:
https://github.com/alephzerox/ancestry-fhe
—
Reply to this email directly, view it on GitHub
<#95 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABL53W6WL5Q7UCYZTYFM543ZB6M6LAVCNFSM6AAAAABDBRCZ6KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBWGMYTGMRUGU>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Thank you to everyone who submitted to the Zama Bounty Program Season 5. Our team will review all submissions and give some initial feedbacks in the coming days! |
github-project-automation
bot
moved this from Bounties [Season 5]
to Awarded Contributions
in Zama Bounty and Grant Program Overview
May 13, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Concrete ML simplifies the use of FHE for data scientists to help them automatically turn machine learning models into their homomorphic equivalent. FHE can be particularly useful to protect users health care data, and is a perfect candidate to solve the privacy risks with using genealogy analysis websites.
Over 30 million people have taken DNA tests to determine their ancestry through computer genetic genealogy. By processing the digitized sequences of DNA bases, sophisticated computer algorithms can identify if one’s ancestors came from a number of ethnic groups. DNA is sensitive personal identification as it can identify an individual uniquely and leaks of DNA data have already happened.
DNA ancestry identification is a complex process that involves multiple steps. First, DNA phasing assigns alleles (the As, Cs, Ts and Gs in DNA strands) to the paternal and maternal chromosomes. Second, ancestry can be determined by referencing specific segments of the DNA with large databases of DNA of known ancestry. An alternative is to use machine learning to classify each such segment and, finally, to aggregate the ancestry of each individual segment into a final classification.
Using Fully Homomorphic Encryption we think determining ancestry can be done on encrypted DNA sequences, preserving the security of users’ DNA. Most published machine-learning based methods for ancestry identification typically perform local ancestry inference. Global ancestry inference tries to compute the genome-wide average of the population contributions while local ancestry inference (LAI) tries to identify the regional ancestry of a genomic segment, which is more amenable to machine learning. To build the global ancestry from local decisions, LAI algorithms use machine learning also in a second step, taking ancestry classifications of different segments and fusing them into a single classification for a person.
Many types of machine learning models were proposed for local ancestry inference: neural networks [1], hidden markov models [2], decision trees or logistic regression [3] (the G-nomix project). A great hands-on resource on machine learning for ancestry is the AI Sandbox github.
Submission
1️⃣ Want to solve this bounty? Register here.
2️⃣ Ready to submit your solution? Submit here.
🗓️ Submission deadline: May 12th, 2024.
Overview
The goal of this bounty is to train ancestry classifiers using Concrete ML so they can execute on encrypted data. You can assume the input DNA is phased and in the proper format. As mentioned above, most approaches are two-stage. First, classifiers are trained for individual genomics windows. Second, a smoother is trained which combines the predictions of the individual classifiers.
You can use any datasets that you want as long as you abide by their license agreements. Some examples are the 1000 Genomes Project, the Simons Genome Diversity Project and the Human Genome Diversity Project.
What we expect
Important
To qualify for the maximum prize, the FHE application should perform both stages of the classification in FHE.
Partial prizes will be awarded if only one stage of the pipeline is in FHE, but you can assume preprocessing such as phasing is done in the clear in a separate step (you can use phased DNA directly).
Implementation guide
Reward
🥇Best submission: up to €5,000.
To be considered best submission, a solution must be efficient, effective and demonstrate a deep understanding of the core problem. Alongside the technical correctness, it should also be submitted with a clean code, clear explanations and a complete documentation.
🥈Second-best submission: up to €3,000.
For a solution to be considered the second best submission, it should be both efficient and effective. The code should be neat and readable, while its documentation might not be as exhaustive as the best submission, it should cover the key aspects of the solution.
🥉Third-best submission: up to €2,000.
The third best submission is one that presents a solution that effectively tackles the challenge at hand, even if it may have certain areas of improvement in terms of efficiency or depth of understanding. Documentation should be present, covering the essential components of the solution.
Reward amounts are decided based on code quality, model accuracy scores and speed performance on a m6i.metal AWS server. When multiple solutions of comparable scope are submitted they are compared based on the accuracy metrics and computation times.
Related links and references
[1] Benet Oriol Sabat, Daniel Mas Montserrat, Xavier Giro-i-Nieto, Alexander G Ioannidis, SALAI-Net: species-agnostic local ancestry inference network, Bioinformatics, Volume 38, Issue Supplement_2, September 2022, Pages ii27–ii33,
[2] Wei Y, Zhi D, Zhang S. Fast and accurate local ancestry inference with Recomb-Mix. bioRxiv [Preprint]. 2023 Nov 19:2023.11.17.567650. doi: 10.1101/2023.11.17.567650. PMID: 38014185; PMCID: PMC10680832.
[3] Helgi Hilmarsson, Arvind S. Kumar, Richa Rastogi, Carlos D. Bustamante, Daniel Mas Montserrat, Alexander G. Ioannidis, High Resolution Ancestry Deconvolution for Next Generation Genomic Data, bioRxiv 2021.09.19.460980
Submission
1️⃣ Want to solve this bounty? Register here.
2️⃣ Ready to submit your solution? Submit here.
🗓️ Submission deadline: May 12th, 2024.
Questions?
Do you have a specific question about this bounty? Join the live conversation on the FHE.org discord server here. You can also send us an email at: [email protected]
The text was updated successfully, but these errors were encountered: