Skip to content
GCC AI Research

Search

Results for "data sparseness"

Many-cell sequencing: machine learning principles and methods for moving beyond single cells to population-scale analysis

MBZUAI ·

A talk discusses the challenges of single-cell data analysis, such as feature sparsity and the effects of rare cells. AI/ML strategies are uniquely positioned to model this data. ImYoo, a startup founded in 2021, is applying single-cell model architectures for unsupervised discovery of patient groupings and predicting sample-level phenotypical data in autoimmune disease. Why it matters: This highlights the growing application of AI/ML in analyzing single-cell data for population-scale human health studies, an area ripe for innovation and improvement in the Middle East's growing biotech sector.

Addressing NLP problems in low resource settings

MBZUAI ·

Thamar Solorio from the University of Houston will discuss machine learning approaches for spontaneous human language processing. The talk will cover adapting multilingual transformers to code-switching data and using data augmentation for domain adaptation in sequence labeling tasks. Solorio will also provide an overview of other research projects at the RiTUAL lab, focusing on the scarcity of labeled data. Why it matters: This presentation addresses key challenges in Arabic NLP related to data scarcity, which is a persistent obstacle in developing effective AI applications for the region.

Overcoming the curse of dimensionality

MBZUAI ·

MBZUAI Professor Fakhri Karray and co-authors from the University of Waterloo have published "Elements of Dimensionality Reduction and Manifold Learning," a textbook on methods for extracting useful components from large datasets. The book addresses the challenge of the "curse of dimensionality," where growth in datasets complicates their use in machine learning. Karray developed the material from a popular course he taught at Waterloo. Why it matters: The textbook provides a unified resource for students and researchers in machine learning and AI, addressing a foundational challenge in processing high-dimensional data, relevant to diverse applications in the region.

The role of data-driven models in quantifying uncertainty

KAUST ·

KAUST Professor Raul Tempone, an expert in Uncertainty Quantification (UQ), has been appointed as an Alexander von Humboldt Professor at RWTH Aachen University in Germany. This professorship will enable him to further his research on mathematics for uncertainty quantification with new collaborators. Tempone believes the KAUST Strategic Initiative for Uncertainty Quantification (SRI-UQ) contributed to this award. Why it matters: This appointment enhances KAUST's visibility and facilitates cross-fertilization between European and KAUST research groups, benefiting both institutions and attracting talent.

Nonlinear Traffic Prediction as a Matrix Completion Problem with Ensemble Learning

arXiv ·

The paper introduces a novel method for short-term, high-resolution traffic prediction, modeling it as a matrix completion problem solved via block-coordinate descent. An ensemble learning approach is used to capture periodic patterns and reduce training error. The method is validated using both simulated and real-world traffic data from Abu Dhabi, demonstrating superior performance compared to other algorithms.

Short-Term Traffic Forecasting Using High-Resolution Traffic Data

arXiv ·

Researchers developed a data-driven toolkit for short-term traffic forecasting using high-resolution traffic data from urban road sensors. The method models forecasting as a matrix completion problem, mapping inputs to a higher-dimensional space using kernels and adaptive boosting. Validated using real-world data from Abu Dhabi, UAE, the method outperforms state-of-the-art algorithms.

Making sense of silence in gene regulatory networks

MBZUAI ·

MBZUAI researchers collaborated with Carnegie Mellon University and the Broad Institute of MIT and Harvard to develop a new statistical method for analyzing data used for gene regulatory network inference. The method addresses the challenge of distinguishing true zero expression values from dropouts in single-cell RNA sequencing data. This research will be presented at the Twelfth International Conference on Learning Representations (ICLR 2024). Why it matters: Improving gene regulatory network inference can lead to better understanding of disease mechanisms and inform the development of new medicines.

On Transferability of Machine Learning Models

MBZUAI ·

This article discusses domain shift in machine learning, where testing data differs from training data, and methods to mitigate it via domain adaptation and generalization. Domain adaptation uses labeled source data and unlabeled target data. Domain generalization uses labeled data from single or multiple source domains to generalize to unseen target domains. Why it matters: Research in mitigating domain shift enhances the robustness and applicability of AI models in diverse real-world scenarios.