KAUST researchers developed a statistical approach to improve the identification of cancer-related protein mutations by reducing false positives. The method uses Bayesian statistics to analyze protein domain data from tumor samples, accounting for potential errors due to limited data. The team tested their method on prostate cancer data, successfully identifying a known cancer-linked mutation in the DNA binding protein cd00083. Why it matters: This enhances the reliability of cancer research at the molecular level, potentially accelerating the discovery of new therapeutic targets.
Petar Stojanov from the Broad Institute of MIT and Harvard will give a talk on cancer data analysis, covering the fundamentals of cancer, the nature of large-scale data collected, and main analysis objectives. The talk will also address open questions in cancer data analysis and how machine learning and generative modeling can help. Stojanov's research focuses on applying machine learning to genomic analysis of cancer mutation and single-cell RNA sequencing data. Why it matters: Applying AI and machine learning to cancer research can lead to a better understanding of the disease and development of new therapies.
KAUST researchers developed CovMT, a COVID-19 mutation tracking system for authorities and scientists to detect variants. CovMT tracks mutation fingerprints using daily data from the GISAID database of over 1.5 million viral genomes. The system identifies mutation hot spots, enabling public health authorities to stay ahead of new variants. Why it matters: This system provides a tool for rapid variant detection and informed public health decision-making in the region and globally.
Researchers at the Rosalind Franklin Institute are using generative AI, including GANs, to augment limited biological datasets, specifically mirtron data from mirtronDB. The synthetic data created mimics real-world samples, facilitating more comprehensive training of machine learning models, leading to improved mirtron identification tools. They also plan to apply Large Language Models (LLMs) to predict unknown patterns in sequence and structure biology problems. Why it matters: This research explores AI techniques to tackle data scarcity in biological research, potentially accelerating discoveries in noncoding RNA and transposable elements.
KAUST and King Faisal Specialist Hospital and Research Centre (KFSHRC) are collaborating to develop an RNA sequencing tool to improve the diagnosis rate of genetic diseases. The tool analyzes RNA data to find aberrant transcripts and mutations, building on KFSHRC's clinical data and KAUST's computational expertise. The team has already solved cases that DNA sequencing alone could not, including a case of a young child with brain damage caused by a recessive gene mutation. Why it matters: This collaboration can improve disease management and preventative services in the region, directly contributing to Saudi Arabia’s national research priority of health and wellness.
Professor Eran Segal presented The Human Phenotype Project, a longitudinal cohort study with over 10,000 participants. The project aims to identify molecular markers and develop prediction models for disease using deep profiling techniques including medical history, lifestyle, blood tests, and microbiome analysis. The study provides insights into drivers of obesity, diabetes, and heart disease, identifying novel markers at the microbiome, metabolite, and immune system level. Why it matters: Such large-scale phenotyping initiatives could inform personalized medicine approaches relevant to the Middle East's specific health challenges.
A KAUST alumnus presented research on using large language models for complex disease modeling and drug discovery. LLMs were trained on insurance claims of 123 million US people to model diseases and predict genetic parameters. Protein language models were developed to discover remote homologs and functional biomolecules, while RNA language models were used for RNA structure prediction and reverse design. Why it matters: This work highlights the potential of LLMs to accelerate computational biology research and drug development, with a KAUST connection.
Natasa Przulj at the Barcelona Supercomputing Center is developing an AI framework that fuses multi-omic data to improve precision medicine. The framework uses graph-regularized non-negative matrix tri-factorization (NMTF) and network science algorithms for patient stratification, biomarker prediction, and drug repurposing. It is applied to diseases like cancer, Covid-19, and Parkinson's. Why it matters: This research can enable more personalized and effective treatments by leveraging complex biological data to understand disease mechanisms and tailor therapies.