Skip to content
GCC AI Research

Search

Results for "multi-agent communication"

Mind meld: agentic communication through thoughts instead of words

MBZUAI ·

Researchers from MBZUAI, Carnegie Mellon University, and Meta AI presented a new approach called ThoughtComm at NeurIPS 2025 where AI agents communicate through internal, latent representations instead of natural language. This framework extracts and selectively shares latent "thoughts" from agents' internal states, representing the underlying structure of their reasoning. Results show that agents coordinate more effectively, reach consensus faster, and solve problems more accurately using this method. Why it matters: Bypassing the limitations of natural language in AI communication could lead to more efficient and accurate multi-agent systems, impacting areas like robotics, collaborative AI, and distributed problem-solving.

Learning to Cooperate in Multi-Agent Systems

MBZUAI ·

Dr. Yali Du from King's College London will give a presentation on learning to cooperate in multi-agent systems. Her research focuses on enabling cooperative and responsible behavior in machines using reinforcement learning and foundation models. She will discuss enhancing collaboration within social contexts, fostering human-AI coordination, and achieving scalable alignment. Why it matters: This highlights the growing importance of research into multi-agent systems and human-AI interaction, crucial for developing AI that integrates effectively and ethically into society.

Multi-agent Time-based Decision-making for the Search and Action Problem

arXiv ·

This paper introduces a decentralized multi-agent decision-making framework for search and action problems under time constraints, treating time as a budgeted resource where actions have costs and rewards. The approach uses probabilistic reasoning to optimize decisions, maximizing reward within the given time. Evaluated in a simulated search, pick, and place scenario inspired by the Mohamed Bin Zayed International Robotics Challenge (MBZIRC), the algorithm outperformed benchmark strategies. Why it matters: The framework's validation in a Gazebo environment signals potential for real-world robotic applications, particularly in time-sensitive and cooperative tasks within the robotics domain in the UAE.

A Decentralized Multi-Agent Unmanned Aerial System to Search, Pick Up, and Relocate Objects

arXiv ·

This paper presents a decentralized multi-agent unmanned aerial system designed for search, pickup, and relocation of objects. The system integrates multi-agent aerial exploration, object detection/tracking, and aerial gripping. The decentralized system uses global state estimation, reactive collision avoidance, and sweep planning for exploration. Why it matters: The system's successful deployment in demonstrations and competitions like MBZIRC highlights the potential of integrated robotic solutions for complex tasks such as search and rescue in the region.

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

arXiv ·

Researchers introduce MATRIX, a vision-centric agent tuning framework for robust tool-use reasoning in VLMs. The framework includes M-TRACE, a dataset of 28.5K multimodal tasks with 177K verified trajectories, and Pref-X, a set of 11K automatically generated preference pairs. Experiments show MATRIX consistently outperforms open- and closed-source VLMs across three benchmarks.

Communication in the Age of AI: AI for Communication and Communication for AI

MBZUAI ·

Joonhyuk Kang from KAIST gave a presentation at MBZUAI on AI's impact on wireless communication. The talk covered how communication systems can improve AI and how AI can develop wireless systems. Kang's research interests include signal processing for information transmission, security, and machine cognition. Why it matters: This talk highlights the growing intersection of AI and communication technologies in the region, with potential applications for smart cities and autonomous systems.

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

arXiv ·

A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.