GCC AI Research

Archive Monthly

March 2025

8 articles

Top Stories

RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation

arXiv · · CV Research

Researchers from MBZUAI introduced RP-SAM2, a method to improve surgical instrument segmentation by refining point prompts for more stable results. RP-SAM2 uses a novel shift block and compound loss function to reduce sensitivity to point prompt placement, improving segmentation accuracy in data-constrained settings. Experiments on the Cataract1k and CaDIS datasets show that RP-SAM2 enhances segmentation accuracy and reduces variance compared to SAM2, with code available on GitHub.

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

arXiv · · Research LLM

The paper introduces SaudiCulture, a new benchmark for evaluating the cultural competence of LLMs within Saudi Arabia, covering five major geographical regions and diverse cultural domains. The benchmark includes questions of varying complexity and distinguishes between common and specialized regional knowledge. Evaluations of five LLMs (GPT-4, Llama 3.3, FANAR, Jais, and AceGPT) revealed performance declines on region-specific questions, highlighting the need for region-specific knowledge in LLM training.

SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Transformation

arXiv · · Research Healthcare

Researchers introduce SALT, a parameter-efficient fine-tuning method for medical image segmentation that combines singular value adaptation with low-rank transformation. SALT selectively adapts influential singular values and complements this with a low-rank update for the remaining subspace. Experiments on five medical datasets show SALT outperforms state-of-the-art PEFT methods by 2-5% in Dice score with only 3.9% trainable parameters.

Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling

arXiv · · Research Healthcare

The paper introduces UAE-3D, a multi-modal VAE for 3D molecule generation that compresses molecules into a unified latent space, maintaining near-zero reconstruction error. This approach simplifies latent diffusion modeling by eliminating the need to handle multi-modality and equivariance separately. Experiments on GEOM-Drugs and QM9 datasets show UAE-3D establishes new benchmarks in de novo and conditional 3D molecule generation, with significant improvements in efficiency and quality.

Tracking Meets Large Multimodal Models for Driving Scenario Understanding

arXiv · · LLM CV

Researchers at MBZUAI have introduced a novel approach to enhance Large Multimodal Models (LMMs) for autonomous driving by integrating 3D tracking information. This method uses a track encoder to embed spatial and temporal data, enriching visual queries and improving the LMM's understanding of driving scenarios. Experiments on DriveLM-nuScenes and DriveLM-CARLA benchmarks demonstrate significant improvements in perception, planning, and prediction tasks compared to baseline models.

The AI Pentad, the CHARME$^{2}$D Model, and an Assessment of Current-State AI Regulation

arXiv · · Policy Ethics

This paper introduces the AI Pentad model, comprising humans/organizations, algorithms, data, computing, and energy, as a framework for AI regulation. It also presents the CHARME²D Model to link the AI Pentad with regulatory enablers like registration, monitoring, and enforcement. The paper assesses AI regulatory efforts in the EU, China, UAE, UK, and US using the CHARME²D model, highlighting strengths and weaknesses.

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

arXiv · · LLM NLP

MBZUAI researchers introduce LLMVoX, a 30M-parameter, LLM-agnostic, autoregressive streaming text-to-speech (TTS) system that generates high-quality speech with low latency. The system preserves the capabilities of the base LLM and achieves a lower Word Error Rate compared to speech-enabled LLMs. LLMVoX supports seamless, infinite-length dialogues and generalizes to new languages with dataset adaptation, including Arabic.