Skip to main content

Cause-and-Effect Relationships of Genes

A causal theory for studying the cause-and-effect relationships of genes


By studying changes in gene expression, researchers learn how cells function at a molecular level, which could help them understand the development of certain diseases.

But a human has about 20,000 genes that can affect each other in complex ways, so even knowing which groups of genes to target is an enormously complicated problem. Also, genes work together in modules that regulate each other.

MIT researchers have now developed theoretical foundations for methods that could identify the best way to aggregate genes into related groups so they can efficiently learn the underlying cause-and-effect relationships between many genes.

Importantly, this new method accomplishes this using only observational data. This means researchers don’t need to perform costly, and sometimes infeasible, interventional experiments to obtain the data needed to infer the underlying causal relationships.

In the long run, this technique could help scientists identify potential gene targets to induce certain behavior in a more accurate and efficient manner, potentially enabling them to develop precise treatments for patients.

“In genomics, it is very important to understand the mechanism underlying cell states. But cells have a multiscale structure, so the level of summarization is very important, too. If you figure out the right way to aggregate the observed data, the information you learn about the system should be more interpretable and useful,” says graduate student Jiaqi Zhang, an Eric and Wendy Schmidt Center Fellow and co-lead author of a paper on this technique.

Zhang is joined on the paper by co-lead author Ryan Welch, currently a master’s student in engineering; and senior author Caroline Uhler, a professor in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS) who is also director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, and a researcher at MIT’s Laboratory for Information and Decision Systems (LIDS). The research will be presented at the Conference on Neural Information Processing Systems.

Learning from observational data

The problem the researchers set out to tackle involves learning programs of genes. These programs describe which genes function together to regulate other genes in a biological process, such as cell development or differentiation.

Since scientists can’t efficiently study how all 20,000 genes interact, they use a technique called causal disentanglement to learn how to combine related groups of genes into a representation that allows them to efficiently explore cause-and-effect relationships.

In previous work, the researchers demonstrated how this could be done effectively in the presence of interventional data, which are data obtained by perturbing variables in the network.

But it is often expensive to conduct interventional experiments, and there are some scenarios where such experiments are either unethical or the technology is not good enough for the intervention to succeed.

With only observational data, researchers can’t compare genes before and after an intervention to learn how groups of genes function together.

“Most research in causal disentanglement assumes access to interventions, so it was unclear how much information you can disentangle with just observational data,” Zhang says.

The MIT researchers developed a more general approach that uses a machine-learning algorithm to effectively identify and aggregate groups of observed variables, e.g., genes, using only observational data.

They can use this technique to identify causal modules and reconstruct an accurate underlying representation of the cause-and-effect mechanism. “While this research was motivated by the problem of elucidating cellular programs, we first had to develop novel causal theory to understand what could and could not be learned from observational data. With this theory in hand, in future work we can apply our understanding to genetic data and identify gene modules as well as their regulatory relationships,” Uhler says.

A layerwise representation

Using statistical techniques, the researchers can compute a mathematical function known as the variance for the Jacobian of each variable’s score. Causal variables that don’t affect any subsequent variables should have a variance of zero.

The researchers reconstruct the representation in a layer-by-layer structure, starting by removing the variables in the bottom layer that have a variance of zero. Then they work backward, layer-by-layer, removing the variables with zero variance to determine which variables, or groups of genes, are connected.

“Identifying the variances that are zero quickly becomes a combinatorial objective that is pretty hard to solve, so deriving an efficient algorithm that could solve it was a major challenge,” Zhang says.

In the end, their method outputs an abstracted representation of the observed data with layers of interconnected variables that accurately summarizes the underlying cause-and-effect structure.

Each variable represents an aggregated group of genes that function together, and the relationship between two variables represents how one group of genes regulates another. Their method effectively captures all the information used in determining each layer of variables.

After proving that their technique was theoretically sound, the researchers conducted simulations to show that the algorithm can efficiently disentangle meaningful causal representations using only observational data.

In the future, the researchers want to apply this technique in real-world genetics applications. They also want to explore how their method could provide additional insights in situations where some interventional data are available, or help scientists understand how to design effective genetic interventions. In the future, this method could help researchers more efficiently determine which genes function together in the same program, which could help identify drugs that could target those genes to treat certain diseases.

Genetic mutations, gene expression, molecular pathways, transcription factors, epigenetics, protein synthesis, signal transduction, gene regulation, genetic variants, gene-environment interaction, RNA splicing, chromosomal rearrangements, protein-coding genes, epistasis, gene editing, functional genomics, gene silencing, phenotypic traits, heritability, genetic disorders.

#Genetics #GeneExpression #MutationEffects #MolecularPathways #TranscriptionFactors #Epigenetics #ProteinSynthesis #GeneRegulation #GeneVariants #GeneEnvironment #RNA #ChromosomalChanges #ProteinCoding #Epistasis #GeneEditing #FunctionalGenomics #GeneSilencing #Traits #Heritability #GeneticDisorders

Comments

Popular posts from this blog

Fruitful innovation

Fruitful innovation: Transforming watermelon genetics with advanced base editors The development of new adenine base editors (ABE) and adenine-to-thymine/ guanine base editors (AKBE) is transforming watermelon genetic engineering. These innovative tools enable precise A:T-to-G and A:T-to-T base substitutions, allowing for targeted genetic modifications. The research highlights the efficiency of these editors in generating specific mutations, such as a flowerless phenotype in ClFT (Y84H) mutant plants. This advancement not only enhances the understanding of gene function but also significantly improves molecular breeding, paving the way for more efficient watermelon crop improvement. Traditional breeding methods for watermelon often face challenges in achieving desired genetic traits efficiently and accurately. While CRISPR/Cas9 has provided a powerful tool for genome editing, its precision and scope are sometimes limited. These limitations highlight the need for more advanced gene-e...

Genetic factors with clinical trial stoppage

Genetic factors associated with reasons for clinical trial stoppage Many drug discovery projects are started but few progress fully through clinical trials to approval. Previous work has shown that human genetics support for the therapeutic hypothesis increases the chance of trial progression. Here, we applied natural language processing to classify the free-text reasons for 28,561 clinical trials that stopped before their endpoints were met. We then evaluated these classes in light of the underlying evidence for the therapeutic hypothesis and target properties. We found that trials are more likely to stop because of a lack of efficacy in the absence of strong genetic evidence from human populations or genetically modified animal models. Furthermore, certain trials are more likely to stop for safety reasons if the drug target gene is highly constrained in human populations and if the gene is broadly expressed across tissues. These results support the growing use of human genetics to ...

Genetics study on COVID-19

Large genetic study on severe COVID-19 Bonn researchers confirm three other genes for increased risk in addition to the known TLR7 gene Whether or not a person becomes seriously ill with COVID-19 depends, among other things, on genetic factors. With this in mind, researchers from the University Hospital Bonn (UKB) and the University of Bonn, in cooperation with other research teams from Germany, the Netherlands, Spain and Italy, investigated a particularly large group of affected individuals. They confirmed the central and already known role of the TLR7 gene in severe courses of the disease in men, but were also able to find evidence for a contribution of the gene in women. In addition, they were able to show that genetic changes in three other genes of the innate immune system contribute to severe COVID-19. The results have now been published in the journal " Human Genetics and Genomics Advances ". Even though the number of severe cases following infection with the SARS-CoV-...