Scientists often view statistical models and algorithms as “black boxes.” These tools can deliver accurate predictions, but their internal workings remain largely hidden. In a world increasingly driven by deep learning and vast data processing, Natália Ružičková, a physicist and PhD student at the Institute of Science and Technology Austria (ISTA), has chosen to reevaluate this approach, especially regarding genomic data analysis.
Along with recent ISTA graduate Michal Hledík and Professor Gašper Tkačik, Ružičková has proposed a new model designed to analyze polygenic diseases. These conditions arise from multiple regions in the genome and understanding how these regions contribute to disease is crucial. Their research combines advanced genome analysis with essential biological insights, and their findings are published in Proceedings of the National Academy of Sciences (PNAS).
The Human Genome Project
The Human Genome Project began in 1990 with the aim of decoding human DNA—the genetic blueprint of our species. By 2003, the project was completed, leading to significant advancements in science and medicine. Decoding the human genome has allowed researchers to investigate diseases linked to specific genetic mutations. With around 20,000 genes and even more base pairs in the human genome, strong statistical methods are vital. This necessity gave rise to genome-wide association studies (GWAS).
GWAS help identify genetic variants associated with traits, including susceptibility to diseases. The process involves dividing participants into two groups: healthy individuals and those with a particular illness. Researchers then analyze DNA to find variations that are more common in those with the disease.
Complexity of Genetic Interactions
Initially, scientists expected GWAS to reveal a few mutations in known genes that would explain the differences between healthy and sick individuals. However, the reality is more complex. Each mutation has a minimal individual effect on disease risk. Together, however, they offer a better understanding of why certain individuals develop diseases, known as polygenic diseases. For instance, type 2 diabetes is polygenic, involving numerous mutations rather than a single gene. Some mutations impact insulin production, insulin action, or glucose metabolism, while many are found in previously unexplored genomic regions.
The Omnigenic Model
In 2017, Evan A. Boyle and his team from Stanford University introduced the “omnigenic model,” suggesting that regulatory networks in cells link genes with different functions. “Since genes are interconnected, a mutation in one can affect others through these networks,” Ružičková explains. Although many genes contribute to diseases through these networks, the omnigenic model has remained largely conceptual and difficult to test until now.
Ružičková and her colleagues have developed a mathematical framework based on this model, called the “quantitative omnigenic model” (QOM).
Integrating Statistics and Biology
To validate their model, the researchers applied it to a well-studied biological system: the common yeast, Saccharomyces cerevisiae. This single-celled organism has a cell structure similar to more complex organisms, including humans. Ružičková notes, “In yeast, we understand how regulatory networks interconnect genes quite well.”
Using QOM, the scientists predicted gene expression levels—indicating how much genetic information is utilized—and examined how mutations spread through yeast’s regulatory network. The predictions were highly successful; the model identified relevant genes and specified which mutations contributed to specific outcomes.
Implications for Understanding Polygenic Diseases
The scientists aimed not to surpass GWAS in predictive performance but to create an interpretable model. While standard GWAS functions as a “black box,” providing statistical links between mutations and diseases, QOM offers a causal mechanism showing how mutations may lead to diseases.
Understanding these biological contexts and causal pathways can significantly impact finding new treatments. Although the model is still in its early stages, it holds promise for enhancing our understanding of polygenic diseases. Ružičková adds, “With sufficient knowledge of regulatory networks, similar models could be developed for other organisms. Our study in yeast is just the first step and proof of concept. This understanding can pave the way for applications in human genetics.”
Related topics:
- Sleep Deprivation Linked to Long-Term Health Issues
- Study Links Early Life Factors to Gut Microflora Diversity in Infants
- Federated Secure Computing Project Analyzes Cancer Patient Data Without Sharing Actual Data