KAUST researchers developed a statistical approach to improve the identification of cancer-related protein mutations by reducing false positives. The method uses Bayesian statistics to analyze protein domain data from tumor samples, accounting for potential errors due to limited data. The team tested their method on prostate cancer data, successfully identifying a known cancer-linked mutation in the DNA binding protein cd00083. Why it matters: This enhances the reliability of cancer research at the molecular level, potentially accelerating the discovery of new therapeutic targets.
Petar Stojanov from the Broad Institute of MIT and Harvard will give a talk on cancer data analysis, covering the fundamentals of cancer, the nature of large-scale data collected, and main analysis objectives. The talk will also address open questions in cancer data analysis and how machine learning and generative modeling can help. Stojanov's research focuses on applying machine learning to genomic analysis of cancer mutation and single-cell RNA sequencing data. Why it matters: Applying AI and machine learning to cancer research can lead to a better understanding of the disease and development of new therapies.
A KAUST alumnus presented research on using large language models for complex disease modeling and drug discovery. LLMs were trained on insurance claims of 123 million US people to model diseases and predict genetic parameters. Protein language models were developed to discover remote homologs and functional biomolecules, while RNA language models were used for RNA structure prediction and reverse design. Why it matters: This work highlights the potential of LLMs to accelerate computational biology research and drug development, with a KAUST connection.
Researchers at the Rosalind Franklin Institute are using generative AI, including GANs, to augment limited biological datasets, specifically mirtron data from mirtronDB. The synthetic data created mimics real-world samples, facilitating more comprehensive training of machine learning models, leading to improved mirtron identification tools. They also plan to apply Large Language Models (LLMs) to predict unknown patterns in sequence and structure biology problems. Why it matters: This research explores AI techniques to tackle data scarcity in biological research, potentially accelerating discoveries in noncoding RNA and transposable elements.
Daisuke Kihara from Purdue University presented a seminar at MBZUAI on using deep learning for biomolecular structure modeling. His lab is developing 3D structure modeling methods, especially for cryo-electron microscopy (cryo-EM) data. They are also working on RNA structure prediction and peptide docking using deep neural networks inspired by AlphaFold2. Why it matters: Applying advanced deep learning techniques to biomolecular structure prediction can accelerate drug discovery and our understanding of molecular functions.
KAUST researchers developed CovMT, a COVID-19 mutation tracking system for authorities and scientists to detect variants. CovMT tracks mutation fingerprints using daily data from the GISAID database of over 1.5 million viral genomes. The system identifies mutation hot spots, enabling public health authorities to stay ahead of new variants. Why it matters: This system provides a tool for rapid variant detection and informed public health decision-making in the region and globally.
Natasa Przulj at the Barcelona Supercomputing Center is developing an AI framework that fuses multi-omic data to improve precision medicine. The framework uses graph-regularized non-negative matrix tri-factorization (NMTF) and network science algorithms for patient stratification, biomarker prediction, and drug repurposing. It is applied to diseases like cancer, Covid-19, and Parkinson's. Why it matters: This research can enable more personalized and effective treatments by leveraging complex biological data to understand disease mechanisms and tailor therapies.
A DeepMind researcher presented work on incorporating symmetries into machine learning models, with applications to lattice-QCD and molecular dynamics. The work includes permutation and translation-invariant normalizing flows for free-energy estimation in molecular dynamics. They also presented U(N) and SU(N) Gauge-equivariant normalizing flows for pure Gauge simulations and its extensions to incorporate fermions in lattice-QCD. Why it matters: Applying symmetry principles to generative models could improve AI's ability to model complex physical systems relevant to materials science and other fields in the region.