Search

Results for "data scarcity"

Addressing NLP problems in low resource settings

MBZUAI · Invalid Date

Thamar Solorio from the University of Houston will discuss machine learning approaches for spontaneous human language processing. The talk will cover adapting multilingual transformers to code-switching data and using data augmentation for domain adaptation in sequence labeling tasks. Solorio will also provide an overview of other research projects at the RiTUAL lab, focusing on the scarcity of labeled data. Why it matters: This presentation addresses key challenges in Arabic NLP related to data scarcity, which is a persistent obstacle in developing effective AI applications for the region.

The role of data-driven models in quantifying uncertainty

KAUST · Jul 15

KAUST Professor Raul Tempone, an expert in Uncertainty Quantification (UQ), has been appointed as an Alexander von Humboldt Professor at RWTH Aachen University in Germany. This professorship will enable him to further his research on mathematics for uncertainty quantification with new collaborators. Tempone believes the KAUST Strategic Initiative for Uncertainty Quantification (SRI-UQ) contributed to this award. Why it matters: This appointment enhances KAUST's visibility and facilitates cross-fertilization between European and KAUST research groups, benefiting both institutions and attracting talent.

Many-cell sequencing: machine learning principles and methods for moving beyond single cells to population-scale analysis

MBZUAI · Invalid Date

A talk discusses the challenges of single-cell data analysis, such as feature sparsity and the effects of rare cells. AI/ML strategies are uniquely positioned to model this data. ImYoo, a startup founded in 2021, is applying single-cell model architectures for unsupervised discovery of patient groupings and predicting sample-level phenotypical data in autoimmune disease. Why it matters: This highlights the growing application of AI/ML in analyzing single-cell data for population-scale human health studies, an area ripe for innovation and improvement in the Middle East's growing biotech sector.

Documenting the 'dodos' of tomorrow

KAUST · Nov 18

Dr. Gustav Paulay from the Florida Museum of Natural History spoke at KAUST in 2018 about the surprisingly low level of knowledge about marine biodiversity. He noted that only a fraction of the millions of marine species are currently known and described. Paulay highlighted the effectiveness of large-scale biodiversity surveys and the use of technology like mass sampling and DNA analysis to speed up species identification. Why it matters: Understanding and documenting marine biodiversity is crucial for conservation efforts and for leveraging the potential of marine resources in the Red Sea region and beyond.

Synthetic data can accurately track environmental disasters

KAUST · Dec 22

KAUST and SARsatX have developed a method using Generative Adversarial Networks (GANs) to generate synthetic SAR imagery for training deep learning models to detect oil spills. Starting with just 17 real SAR images, they generated over 2,000 synthetic images to train a Multi-Attention Network (MANet) model. The MANet model, trained exclusively on synthetic data, achieved 75% accuracy in identifying oil spill areas, matching the performance of models trained on larger real datasets. Why it matters: This advancement enables faster and more reliable environmental monitoring using AI, even when real-world data is scarce, reducing the need to wait for actual disasters to occur.

On Transferability of Machine Learning Models

MBZUAI · Invalid Date

This article discusses domain shift in machine learning, where testing data differs from training data, and methods to mitigate it via domain adaptation and generalization. Domain adaptation uses labeled source data and unlabeled target data. Domain generalization uses labeled data from single or multiple source domains to generalize to unseen target domains. Why it matters: Research in mitigating domain shift enhances the robustness and applicability of AI models in diverse real-world scenarios.

Key Research in Embodied AI

MBZUAI · Invalid Date

Dr. Hao Dong from Peking University presented research on addressing the challenge of limited large-scale training data in embodied AI, particularly for manipulation, task planning, and navigation. The presentation covered simulation learning and large models. Dr. Dong is a chief scientist of China's National Key Research and Development Program and an area chair/associate editor for NeurIPS, CVPR, AAAI, and ICRA. Why it matters: Overcoming data scarcity is crucial for advancing embodied AI research and enabling more sophisticated robotic applications in the region.

Neural Bayes estimators for censored inference with peaks-over-threshold models

arXiv · Jun 27

This paper introduces neural Bayes estimators for censored peaks-over-threshold models, enhancing computational efficiency in spatial extremal dependence modeling. The method uses data augmentation to encode censoring information in the neural network input, challenging traditional likelihood-based approaches. The estimators were applied to assess extreme particulate matter concentrations over Saudi Arabia, demonstrating efficacy in high-dimensional models. Why it matters: The research offers a computationally efficient alternative for environmental modeling and risk assessment in the region.