Skip to content
GCC AI Research

Language and Planning in Robotic Navigation: A Multilingual Evaluation of State-of-the-Art Models

arXiv · · Notable

Summary

This paper introduces Arabic language integration into Vision-and-Language Navigation (VLN) in robotics, evaluating multilingual SLMs like GPT-4o mini, Llama 3 8B, Phi-3 14B, and Jais using the NavGPT framework. The study uses the R2R dataset to assess the impact of language on navigation reasoning through zero-shot sequential action prediction. Results show the framework enables high-level planning in both English and Arabic, though some models face challenges with Arabic due to reasoning limitations and parsing issues. Why it matters: This work highlights the need to improve language model planning and reasoning for effective navigation, especially to unlock the potential of Arabic-language models in real-world applications.

Keywords

VLN · Robotics · Arabic · LLM · Navigation

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Human-Computer Conversational Vision-and-Language Navigation

MBZUAI ·

A presentation discusses the evolution of Vision-and-Language Navigation (VLN) from benchmarks like Room-to-Room (R2R). It highlights the role of Large Language Models (LLMs) such as GPT-4 in enabling more natural human-machine interactions. The presentation showcases work using LLMs to decode navigational instructions and improve robotic navigation. Why it matters: This research demonstrates the potential of merging vision, language, and robotics for advanced AI applications in navigation and human-computer interaction.

LLM-BABYBENCH: Understanding and Evaluating Grounded Planning and Reasoning in LLMs

arXiv ·

MBZUAI researchers introduce LLM-BabyBench, a benchmark suite for evaluating grounded planning and reasoning in LLMs. The suite, built on a textual adaptation of the BabyAI grid world, assesses LLMs on predicting action consequences, generating action sequences, and decomposing instructions. Datasets, evaluation harness, and metrics are publicly available to facilitate reproducible assessment.

Robot Navigation in the Wild

MBZUAI ·

Gregory Chirikjian presented an overview of research on robot navigation in unstructured environments, using computer vision, sensor tech, ML, and motion planning. The methods use multi-modal observations from RGB cameras, 3D LiDAR, and robot odometry for scene perception, along with deep RL for planning. These methods have been integrated with wheeled, home, and legged robots and tested in crowded indoor scenes, home environments, and dense outdoor terrains. Why it matters: This research pushes the boundaries of robotics in complex environments, paving the way for more versatile and autonomous robots in the Middle East.

Language Models' Factuality Depends on the Language of Inquiry

arXiv ·

Researchers introduce a benchmark to evaluate the factual recall and knowledge transferability of multilingual language models across 13 languages. The study reveals that language models often fail to transfer knowledge between languages, even when they possess the correct information in one language. The benchmark and evaluation framework are released to drive future research in multilingual knowledge transfer.