May 19 – May 25, 2025

Top Stories

SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs

arXiv · May 25 · NLP LLM

The Qatar Computing Research Institute (QCRI) has released SpokenNativQA, a multilingual spoken question-answering dataset for evaluating LLMs in conversational settings. The dataset contains 33,000 naturally spoken questions and answers across multiple languages, including low-resource and dialect-rich languages. It aims to address the limitations of text-based QA datasets by incorporating speech variability, accents, and linguistic diversity. Why it matters: This benchmark enables more robust evaluation of LLMs in speech-based interactions, particularly for Arabic dialects and other low-resource languages.

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

arXiv · May 23 · NLP LLM

MBZUAI researchers release 'Fann or Flop', a new benchmark for evaluating Arabic poetry understanding in LLMs. The benchmark covers 12 historical eras and 14 poetic genres, assessing semantic understanding, metaphor interpretation, and cultural context. Evaluation of state-of-the-art LLMs reveals challenges in poetic understanding despite strong performance on standard Arabic benchmarks.

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

arXiv · May 22 · Research Arabic AI

MBZUAI researchers introduce ARB, the first comprehensive benchmark for evaluating step-by-step multimodal reasoning in Arabic across textual and visual modalities. The benchmark spans 11 diverse domains and includes 1,356 multimodal samples with 5,119 human-curated reasoning steps. Evaluations of 12 state-of-the-art LMMs revealed challenges in coherence, faithfulness, and cultural grounding, highlighting the need for culturally aware AI systems.

UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking

arXiv · May 21 · NLP Research

Researchers from MBZUAI have introduced UrduFactCheck, a new framework for fact-checking in Urdu, along with two datasets: UrduFactBench and UrduFactQA. The framework uses monolingual and translation-based evidence retrieval to address the lack of Urdu resources. Evaluations using twelve LLMs showed that translation-augmented methods improve performance, highlighting challenges for open-source LLMs in Urdu.

FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning

arXiv · May 20 · NLP LLM

MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.

May 19 – May 25, 2025

Top Stories

SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark

UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking

FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning

More This Week

Smart Waste Management System for Makkah City using Artificial Intelligence and Internet of Things

Accelerating Saudi sustainable energy innovation

EU-KAUST cooperation for mobility and collaboration in science, innovation, and beyond

KAUST and Qassim University to advance research, innovation, and entrepreneurship

KAUST and Umm Al-Qura University strengthen academic and technical collaboration

Zakat, Tax and Customs Authority, and KAUST advance innovation and support sustainability