This article discusses methods for handling label noise in deep learning, including extracting confident examples and modeling label noise. Tongliang Liu from the University of Sydney presented these approaches. The talk aimed to provide participants with a basic understanding of learning with noisy labels. Why it matters: As AI models are increasingly trained on large, noisy datasets, techniques for robust learning become crucial for reliable real-world performance.
A new paper coauthored by researchers at The University of Melbourne and MBZUAI explores disagreement in human annotation for AI training. The paper treats disagreement as a signal (human label variation or HLV) rather than noise, and proposes new evaluation metrics based on fuzzy set theory. These metrics adapt accuracy and F-score to cases where multiple labels may plausibly apply, aligning model output with the distribution of human judgments. Why it matters: This research addresses a key challenge in NLP by accounting for the inherent ambiguity in human language, potentially leading to more robust and human-aligned AI systems.
MBZUAI researchers will present 20 papers at the 40th International Conference on Machine Learning (ICML) in Honolulu. Visiting Associate Professor Tongliang Liu leads with seven publications, followed by Kun Zhang with six. One paper investigates semi-supervised learning vs. model-based methods for noisy data annotation in deep neural networks. Why it matters: The research addresses the critical issue of data quality and accessibility in machine learning, particularly for organizations with limited resources for data annotation.
Researchers are exploring methods for evaluating the outcome of actions using off-policy observations where the context is noisy or anonymized. They employ proxy causal learning, using two noisy views of the context to recover the average causal effect of an action without explicitly modeling the hidden context. The implementation uses learned neural net representations for both action and context, and demonstrates outperformance compared to an autoencoder-based alternative. Why it matters: This research addresses a key challenge in applying AI in real-world scenarios where data privacy or bandwidth limitations necessitate working with noisy or anonymized data.
The article discusses the challenges in effectively applying text classification techniques, despite the availability of tools like LibMultiLabel. It highlights the importance of guiding users to appropriately use machine learning methods due to considerations in practical applications such as evaluation criteria and data strategies. The piece also mentions a panel discussion hosted by MBZUAI in collaboration with the Manara Center for Coexistence and Dialogue. Why it matters: This signals ongoing efforts within the UAE AI ecosystem to address practical challenges and promote responsible AI usage in NLP applications.
This paper critically examines common assumptions about Arabic dialects used in NLP. The authors analyze a multi-label dataset where sentences in 11 country-level dialects were assessed by native speakers. The analysis reveals that widely held assumptions about dialect grouping and distinctions are oversimplified and not always accurate. Why it matters: The findings suggest that current approaches in Arabic NLP tasks like dialect identification may be limited by these inaccurate assumptions, hindering further progress in the field.
MBZUAI's Maxim Panov is developing uncertainty quantification methods to improve the reliability of language models. His work focuses on providing insights into the confidence level of machine learning models' predictions, especially in scenarios where accuracy is critical, such as medicine. Panov is working on post-processing techniques that can be applied to already-trained models. Why it matters: This research aims to address the issue of "hallucinations" in language models, enhancing their trustworthiness and applicability in sensitive domains within the region and globally.
A KAUST-led meta-study published in Science examines the increasing ocean noise pollution from human activities like shipping and seismic blasting. The study synthesizes findings from 10,000 papers, revealing that anthropogenic noise interferes with marine animals' communication and ecological processes. The research highlights the need for policymakers to address this issue for ocean health and sustainable economies. Why it matters: Understanding and mitigating ocean noise pollution is crucial for preserving marine ecosystems and the biodiversity of the Red Sea.