Fine-grained species recognition with MAviS: a new dataset, benchmark, and model

MBZUAI · Significant research

Summary

MBZUAI researchers have developed MAviS, a new multimodal dataset, benchmark, and chatbot for fine-grained bird species recognition. MAviS includes images, audio, and text to help models identify subtle differences between species, especially rare and regional varieties. The related study was presented at EMNLP 2025 and selected as a "Senior Area Chair Highlight". Why it matters: This work addresses a key limitation in AI's ability to support biodiversity conservation and ecological monitoring in the region and globally.

Keywords

MBZUAI · MAviS · bird species recognition · multimodal dataset · EMNLP

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

MBZUAI researchers earn high-profile honors at EMNLP

MBZUAI · Invalid Date

MBZUAI researchers received high honors at EMNLP 2025 for two research papers, placing them in the top 2% of accepted work. One paper, MAviS, is a multimodal AI system that identifies bird species by combining images, sounds, and text. The other award-winning paper focuses on uncertainty in LLM-as-a-Judge. Why it matters: The recognition highlights MBZUAI's growing influence in NLP and multimodal AI research, particularly in domain-specific applications like biodiversity conservation.

Hybrid Deep Feature Extraction and ML for Construction and Demolition Debris Classification

arXiv · Jan 20

This paper introduces a hybrid deep learning and machine learning pipeline for classifying construction and demolition waste. A dataset of 1,800 images from UAE construction sites was created, and deep features were extracted using a pre-trained Xception network. The combination of Xception features with machine learning classifiers achieved up to 99.5% accuracy, demonstrating state-of-the-art performance for debris identification.

Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization

arXiv · Jan 23

This paper introduces Adaptive Entropy-aware Optimization (AEO), a new framework to tackle Multimodal Open-set Test-time Adaptation (MM-OSTTA). AEO uses Unknown-aware Adaptive Entropy Optimization (UAE) and Adaptive Modality Prediction Discrepancy Optimization (AMP) to distinguish unknown class samples during online adaptation by amplifying the entropy difference between known and unknown samples. The study establishes a new benchmark derived from existing datasets with five modalities and evaluates AEO's performance across various domain shift scenarios, demonstrating its effectiveness in long-term and continual MM-OSTTA settings.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv · Dec 18

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.