Skip to content
GCC AI Research

Search

Results for "IEEE SLT"

Continuous Saudi Sign Language Recognition: A Vision Transformer Approach

arXiv ·

The researchers introduce KAU-CSSL, the first continuous Saudi Sign Language (SSL) dataset focusing on complete sentences. They propose a transformer-based model using ResNet-18 for spatial feature extraction and a Transformer Encoder with Bidirectional LSTM for temporal dependencies. The model achieved 99.02% accuracy in signer-dependent mode and 77.71% in signer-independent mode, advancing communication tools for the SSL community.

Professor Ling Shao becomes IEEE Fellow

MBZUAI ·

Professor Ling Shao, Executive Vice President and Provost of MBZUAI, has been elected an IEEE Fellow. This honor recognizes his contributions to computer vision and representation learning. The IEEE Fellowship is a prestigious distinction given to select IEEE members. Why it matters: This recognition highlights the growing prominence of MBZUAI and its faculty in the international AI research community.

MBZUAI teams shine in competition

MBZUAI ·

Two teams from MBZUAI won awards at the IEEE SLT international hackathon held in Qatar. One team won the "Best Potential Impact Project" award for Autodub, a human-in-the-loop AI dubbing platform. The second MBZUAI team won the "Craziest Idea Award" for a commentator voice synthesizer for video games. Why it matters: The wins highlight MBZUAI's strength in applied AI research and its students' ability to develop innovative solutions with practical applications.

SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Transformation

arXiv ·

Researchers introduce SALT, a parameter-efficient fine-tuning method for medical image segmentation that combines singular value adaptation with low-rank transformation. SALT selectively adapts influential singular values and complements this with a low-rank update for the remaining subspace. Experiments on five medical datasets show SALT outperforms state-of-the-art PEFT methods by 2-5% in Dice score with only 3.9% trainable parameters.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.