Skip to main content

Genetic Databases

‘Anonymous’ genetic databases vulnerable to privacy leaks


A study has raised concerns that a type of genetic database that is increasingly popular with researchers could be exploited to reveal the identities of its participants, or link private health information to their public genetic profiles.

Single-cell data sets can contain information on gene expression in millions of cells collected from thousands of people. They are often freely accessible, providing a valuable resource for researchers who study the effects of diseases at a cellular level. The data are supposed to be anonymized, but a study published on 2 October in Cell shows how genetic data from one study “can be exploited to uncover private information about individuals in another study”, the authors write.

The findings highlight the difficulty of balancing the interests of researchers with the privacy of donors. “Our genomes are very identifying. They can tell a lot about us, our traits, our predisposition to diseases,” says study co-author Gamze Gürsoy, a bioinformatics researcher at Columbia University in New York City. “You can change your credit-card number if it leaks, but you cannot change your genome.”
Sensitive data

Concerns around privacy in genetic data sets have been raised before, but they mainly focused on ‘bulk’ genetic data sets. These contain information on gene activity averaged across a large population of cells rather than data of an individual cell.

It was previously thought that single-cell data sets wouldn’t be as vulnerable to privacy leaks, owing to the level of ‘noise’, or variation in gene expression, between different cells. But Gürsoy and her colleagues demonstrated that was not the case.

The team reviewed three publicly available single-cell data sets, which included blood cells from people with lupus, a chronic autoimmune disease. The researchers found that they could use data on gene expression to predict the structure of a person’s genome, by combining these values with information on expression quantitative trait loci (eQTLs). The details of eQTLs — variations on the chromosome that correlate with gene expression — are also publicly available on single-cell data sets.

To test the reliability of their work, the researchers checked their genome predictions against a genome database corresponding with the cells they used. They were able to link most of the data sets to their corresponding genome, with an accuracy rate of more than 80%.

Unlike the data on gene expression and eQTLs, full-genome databases can usually be accessed only by scientists, to protect donor’s identifying information. But the researchers point out that a participant’s genome data can be publicly available elsewhere. For example, they might have uploaded it to a genealogy website in which users send DNA samples to learn more about their ancestry. In this case, an attacker could identify an individual whose cells are in a single-cell data set using their genome. This could reveal personal data related to a sensitive trait such as a psychiatric disorder, given that research participants are often selected to study the biology of these complex conditions.

Privacy breaches such as these could have real-world implications, including causing employment discrimination, says Gürsoy. She adds that any leaks could even have repercussions for future generations, given that genetic traits can be passed to offspring. “Anything that leaks about us will perpetuate through generations,” she says.

Bradley Malin, who researches large-scale genomic data sharing at Vanderbilt University in Nashville, Tennessee, says that the study is a “novel extension and contribution to the literature”. He adds that future research could explore whether genomic data could still be linked in larger data sets that include samples from thousands or millions of people.

Competing interests

Scientists are unsure about how best to tackle these privacy concerns. “There’s the desire to protect individual privacy, but also the desire to collectively advance medical research, and those are, unfortunately, at odds with each other,” says Mark Gerstein, who researches medical data science at Yale University in New Haven, Connecticut. The simplest solution would be to stop making genetic data so easily accessible, but this would negatively affect research, he says. “We need to share and aggregate large amounts of information.” he says. “Locking it down and making it more private, really, just gums that whole process up.”

In their study, Gürsoy and her colleagues say that there should be greater transparency about the risks for participants who share their genomic data, and suggest that researchers should ensure that donors give consent for their data being shared. Another way forward could be encrypting personal data when it is part of a public database. The authors acknowledge that doing this would complicate the process of building and maintaining data sets, but say that it could help to protect participants’ privacy.

#GeneticPrivacy, #DataLeaks, #AnonymousDatabases, #PrivacyConcerns, #GeneticInformation, #ReIdentification, #GenomicStudies, #DataSecurity, #PrivacyLeaks, #DataVulnerability, #SecurityRisks, #GeneticResearch, #ConsentMatters, #IdentityTracing, #GenomicPrivacy, #GeneticData, #PublicDataRisks, #DataProtection, #GeneticSecurity, #HealthData

International Conference on Genetics and Genomics of Diseases 

Comments

Popular posts from this blog

Fruitful innovation

Fruitful innovation: Transforming watermelon genetics with advanced base editors The development of new adenine base editors (ABE) and adenine-to-thymine/ guanine base editors (AKBE) is transforming watermelon genetic engineering. These innovative tools enable precise A:T-to-G and A:T-to-T base substitutions, allowing for targeted genetic modifications. The research highlights the efficiency of these editors in generating specific mutations, such as a flowerless phenotype in ClFT (Y84H) mutant plants. This advancement not only enhances the understanding of gene function but also significantly improves molecular breeding, paving the way for more efficient watermelon crop improvement. Traditional breeding methods for watermelon often face challenges in achieving desired genetic traits efficiently and accurately. While CRISPR/Cas9 has provided a powerful tool for genome editing, its precision and scope are sometimes limited. These limitations highlight the need for more advanced gene-e...

Genetic factors with clinical trial stoppage

Genetic factors associated with reasons for clinical trial stoppage Many drug discovery projects are started but few progress fully through clinical trials to approval. Previous work has shown that human genetics support for the therapeutic hypothesis increases the chance of trial progression. Here, we applied natural language processing to classify the free-text reasons for 28,561 clinical trials that stopped before their endpoints were met. We then evaluated these classes in light of the underlying evidence for the therapeutic hypothesis and target properties. We found that trials are more likely to stop because of a lack of efficacy in the absence of strong genetic evidence from human populations or genetically modified animal models. Furthermore, certain trials are more likely to stop for safety reasons if the drug target gene is highly constrained in human populations and if the gene is broadly expressed across tissues. These results support the growing use of human genetics to ...

Genetics study on COVID-19

Large genetic study on severe COVID-19 Bonn researchers confirm three other genes for increased risk in addition to the known TLR7 gene Whether or not a person becomes seriously ill with COVID-19 depends, among other things, on genetic factors. With this in mind, researchers from the University Hospital Bonn (UKB) and the University of Bonn, in cooperation with other research teams from Germany, the Netherlands, Spain and Italy, investigated a particularly large group of affected individuals. They confirmed the central and already known role of the TLR7 gene in severe courses of the disease in men, but were also able to find evidence for a contribution of the gene in women. In addition, they were able to show that genetic changes in three other genes of the innate immune system contribute to severe COVID-19. The results have now been published in the journal " Human Genetics and Genomics Advances ". Even though the number of severe cases following infection with the SARS-CoV-...