Self-supervised DNA models and scalable sequence processing with memory augmented transformers

MBZUAI · Notable

Research Healthcare LLM Infrastructure Partnership

Summary

Dr. Mikhail Burtsev of the London Institute presented research on GENA-LM, a suite of transformer-based DNA language models. The talk addressed the challenge of scaling transformers for genomic sequences, proposing recurrent memory augmentation to handle long input sequences efficiently. This approach improves language modeling performance and holds promise for memory-intensive applications in bioinformatics. Why it matters: This research can significantly advance AI's capabilities in genomics by enabling the processing of much larger DNA sequences, with potential breakthroughs in understanding and treating diseases.

Keywords

GENA-LM · DNA · transformers · genomics · memory augmentation

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Self-Supervised Learning AI and AI for Molecular Biology

MBZUAI · Invalid Date

Xiao Wang from Purdue University presented research on Adversarial Contrastive Learning (AdCo) and Cooperative-adversarial Contrastive Learning (CaCo) for improved self-supervised learning. He also discussed CryoREAD, a framework for building DNA/RNA structures from cryo-EM maps, and future work in deep learning for drug discovery. Wang's algorithms have impacted molecular biology, leading to new structure discoveries published in journals like Cell and Nature Microbiology. Why it matters: The research advances AI techniques for crucial tasks in molecular biology and drug discovery, with potential applications for institutions in the GCC region focused on healthcare and biotechnology.

Complex disease modeling and efficient drug discovery with large language models

MBZUAI · Invalid Date

A KAUST alumnus presented research on using large language models for complex disease modeling and drug discovery. LLMs were trained on insurance claims of 123 million US people to model diseases and predict genetic parameters. Protein language models were developed to discover remote homologs and functional biomolecules, while RNA language models were used for RNA structure prediction and reverse design. Why it matters: This work highlights the potential of LLMs to accelerate computational biology research and drug development, with a KAUST connection.

Generative Artificial Intelligence in RNA Biology

MBZUAI · Invalid Date

Researchers at the Rosalind Franklin Institute are using generative AI, including GANs, to augment limited biological datasets, specifically mirtron data from mirtronDB. The synthetic data created mimics real-world samples, facilitating more comprehensive training of machine learning models, leading to improved mirtron identification tools. They also plan to apply Large Language Models (LLMs) to predict unknown patterns in sequence and structure biology problems. Why it matters: This research explores AI techniques to tackle data scarcity in biological research, potentially accelerating discoveries in noncoding RNA and transposable elements.

Upsampling Autoencoder for Self-Supervised Point Cloud Learning

arXiv · Mar 21

This paper introduces a self-supervised learning method for point cloud analysis using an upsampling autoencoder (UAE). The model uses subsampling and an encoder-decoder architecture to reconstruct the original point cloud, learning both semantic and geometric information. Experiments show the UAE outperforms existing methods in shape classification, part segmentation, and point cloud upsampling tasks.

Self-supervised DNA models and scalable sequence processing with memory augmented transformers

Summary

Keywords

Related

Self-Supervised Learning AI and AI for Molecular Biology

Complex disease modeling and efficient drug discovery with large language models

Generative Artificial Intelligence in RNA Biology

Upsampling Autoencoder for Self-Supervised Point Cloud Learning