Thamar Solorio from the University of Houston will discuss machine learning approaches for spontaneous human language processing. The talk will cover adapting multilingual transformers to code-switching data and using data augmentation for domain adaptation in sequence labeling tasks. Solorio will also provide an overview of other research projects at the RiTUAL lab, focusing on the scarcity of labeled data. Why it matters: This presentation addresses key challenges in Arabic NLP related to data scarcity, which is a persistent obstacle in developing effective AI applications for the region.
KAUST Professor Raul Tempone, an expert in Uncertainty Quantification (UQ), has been appointed as an Alexander von Humboldt Professor at RWTH Aachen University in Germany. This professorship will enable him to further his research on mathematics for uncertainty quantification with new collaborators. Tempone believes the KAUST Strategic Initiative for Uncertainty Quantification (SRI-UQ) contributed to this award. Why it matters: This appointment enhances KAUST's visibility and facilitates cross-fertilization between European and KAUST research groups, benefiting both institutions and attracting talent.
A talk discusses the challenges of single-cell data analysis, such as feature sparsity and the effects of rare cells. AI/ML strategies are uniquely positioned to model this data. ImYoo, a startup founded in 2021, is applying single-cell model architectures for unsupervised discovery of patient groupings and predicting sample-level phenotypical data in autoimmune disease. Why it matters: This highlights the growing application of AI/ML in analyzing single-cell data for population-scale human health studies, an area ripe for innovation and improvement in the Middle East's growing biotech sector.
Dr. Gustav Paulay from the Florida Museum of Natural History spoke at KAUST in 2018 about the surprisingly low level of knowledge about marine biodiversity. He noted that only a fraction of the millions of marine species are currently known and described. Paulay highlighted the effectiveness of large-scale biodiversity surveys and the use of technology like mass sampling and DNA analysis to speed up species identification. Why it matters: Understanding and documenting marine biodiversity is crucial for conservation efforts and for leveraging the potential of marine resources in the Red Sea region and beyond.
KAUST and SARsatX have developed a method using Generative Adversarial Networks (GANs) to generate synthetic SAR imagery for training deep learning models to detect oil spills. Starting with just 17 real SAR images, they generated over 2,000 synthetic images to train a Multi-Attention Network (MANet) model. The MANet model, trained exclusively on synthetic data, achieved 75% accuracy in identifying oil spill areas, matching the performance of models trained on larger real datasets. Why it matters: This advancement enables faster and more reliable environmental monitoring using AI, even when real-world data is scarce, reducing the need to wait for actual disasters to occur.
This article discusses domain shift in machine learning, where testing data differs from training data, and methods to mitigate it via domain adaptation and generalization. Domain adaptation uses labeled source data and unlabeled target data. Domain generalization uses labeled data from single or multiple source domains to generalize to unseen target domains. Why it matters: Research in mitigating domain shift enhances the robustness and applicability of AI models in diverse real-world scenarios.
Dr. Hao Dong from Peking University presented research on addressing the challenge of limited large-scale training data in embodied AI, particularly for manipulation, task planning, and navigation. The presentation covered simulation learning and large models. Dr. Dong is a chief scientist of China's National Key Research and Development Program and an area chair/associate editor for NeurIPS, CVPR, AAAI, and ICRA. Why it matters: Overcoming data scarcity is crucial for advancing embodied AI research and enabling more sophisticated robotic applications in the region.
This paper introduces neural Bayes estimators for censored peaks-over-threshold models, enhancing computational efficiency in spatial extremal dependence modeling. The method uses data augmentation to encode censoring information in the neural network input, challenging traditional likelihood-based approaches. The estimators were applied to assess extreme particulate matter concentrations over Saudi Arabia, demonstrating efficacy in high-dimensional models. Why it matters: The research offers a computationally efficient alternative for environmental modeling and risk assessment in the region.