Skip to content
GCC AI Research

Search

Results for "VLN"

Human-Computer Conversational Vision-and-Language Navigation

MBZUAI ·

A presentation discusses the evolution of Vision-and-Language Navigation (VLN) from benchmarks like Room-to-Room (R2R). It highlights the role of Large Language Models (LLMs) such as GPT-4 in enabling more natural human-machine interactions. The presentation showcases work using LLMs to decode navigational instructions and improve robotic navigation. Why it matters: This research demonstrates the potential of merging vision, language, and robotics for advanced AI applications in navigation and human-computer interaction.

Testing the limits of vision language models: A new benchmark dataset presented at ACL

MBZUAI ·

MBZUAI researchers presented EXAMS-V, a new benchmark dataset for evaluating the reasoning and processing abilities of vision language models (VLMs). EXAMS-V contains over 20,000 multiple-choice questions across 26 subjects and 11 languages, including Arabic. The dataset presents the questions within images, testing the VLM's ability to integrate visual and textual information. Why it matters: This dataset fills a gap in VLM evaluation, providing a valuable resource for assessing and improving the multimodal reasoning capabilities of these models, particularly in diverse languages like Arabic.

Making History: ASPIRE to Launch Inaugural ‘Abu Dhabi Autonomous Racing League’ Redefining Future of Extreme Sport on April 27

TII ·

The inaugural ASPIRE Abu Dhabi Autonomous Racing League (A2RL) will take place on April 27th at the Yas Marina Circuit with 8 teams competing for a $2.25 million prize. Teams will use identical Dallara Super Formula SF23 cars autonomized by TII, relying on their coding and AI algorithms to race. The event will feature autonomous cars racing simultaneously and an AI vs Human race with former F1 driver Daniil Kvyat. Why it matters: This event highlights the UAE's commitment to advancing AI and autonomous systems, potentially establishing Abu Dhabi as a hub for autonomous vehicle innovation in extreme conditions.

Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

arXiv ·

This paper introduces Arabic language integration into Vision-and-Language Navigation (VLN) in robotics, evaluating multilingual SLMs like GPT-4o mini, Llama 3 8B, Phi-3 14B, and Jais using the NavGPT framework. The study uses the R2R dataset to assess the impact of language on navigation reasoning through zero-shot sequential action prediction. Results show the framework enables high-level planning in both English and Arabic, though some models face challenges with Arabic due to reasoning limitations and parsing issues. Why it matters: This work highlights the need to improve language model planning and reasoning for effective navigation, especially to unlock the potential of Arabic-language models in real-world applications.