MBZUAI researchers are presenting a new approach to open-world object detection at the AAAI conference. The method enables machines to distinguish between known and unknown objects in images, and then learn to classify the unknown objects. PhD student Sahal Shaji Mullappilly is the lead author of the study, titled "Semi-Supervised Open-World Detection". Why it matters: This research addresses a key limitation in current object detection systems, allowing for more adaptable and robust AI in real-world applications.
This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.
Researchers from Alexandria University introduce AlexU-Word, a new dataset for offline Arabic handwriting recognition. The dataset contains 25,114 samples of 109 unique Arabic words, covering all letter shapes, collected from 907 writers. The dataset is designed for closed-vocabulary word recognition and to support segmented letter recognition-based systems. Why it matters: This dataset can help advance Arabic handwriting recognition systems, addressing a need for high-quality Arabic datasets in NLP research.
MBZUAI researchers tackled the challenge of AI-powered waste detection in messy, real-world recycling facilities. They fine-tuned modern object detection models on real industrial waste imagery and combined this with a semi-supervised learning pipeline. Fine-tuning more than doubled performance and their semi-supervised pipeline outperformed fully supervised baselines. Why it matters: This research offers a practical path for open research that can rival proprietary systems while reducing the need for costly manual labeling in waste management, a problem of global importance.
This article discusses domain shift in machine learning, where testing data differs from training data, and methods to mitigate it via domain adaptation and generalization. Domain adaptation uses labeled source data and unlabeled target data. Domain generalization uses labeled data from single or multiple source domains to generalize to unseen target domains. Why it matters: Research in mitigating domain shift enhances the robustness and applicability of AI models in diverse real-world scenarios.
A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.