Researchers from MBZUAI, Carnegie Mellon University, and Meta AI presented a new approach called ThoughtComm at NeurIPS 2025 where AI agents communicate through internal, latent representations instead of natural language. This framework extracts and selectively shares latent "thoughts" from agents' internal states, representing the underlying structure of their reasoning. Results show that agents coordinate more effectively, reach consensus faster, and solve problems more accurately using this method. Why it matters: Bypassing the limitations of natural language in AI communication could lead to more efficient and accurate multi-agent systems, impacting areas like robotics, collaborative AI, and distributed problem-solving.
The paper introduces MIRAGE, a framework for evaluating LLMs' ability to simulate human behaviors in murder mystery games. MIRAGE uses four methods: TII, CIC, ICI and SCI to assess the LLMs' role-playing proficiency. Experiments show that even GPT-4 struggles with the complexities of the MIRAGE framework.
MBZUAI Provost Timothy Baldwin predicts that 2025 will be a breakout year for agentic AI, with 33% of enterprise software applications including agentic AI capabilities by 2028. MBZUAI doctoral students Wafa Alghallabi and Omkar Thawaker have launched Lawa.AI, an AI agent being tested on the university's website to provide faster answers and deeper understanding. Lawa.AI evolved from a research project in multimodal efficiency and LLMs and aims to bridge the gap between people and information in higher education and government. Why it matters: This highlights the UAE's focus on translating AI research into practical applications and the growing importance of agentic AI in various sectors.
MBZUAI introduces Agent-X, a benchmark for evaluating multi-step reasoning in vision-centric agents across real-world, multimodal settings. Agent-X includes 828 tasks with diverse visual contexts and spans six environments, requiring tool use and stepwise decision-making. Experiments show that current LLMs struggle with multi-step vision tasks, achieving less than 50% success, highlighting areas for improvement in LMM reasoning and tool use.
This article previews a talk by Gül Varol from Ecole des Ponts ParisTech on bridging natural language and 3D human motions. The talk will cover text-to-motion synthesis using generative models and text-to-motion retrieval models based on the ACTOR, TEMOS, TMR, TEACH, and SINC papers. Varol's research interests include video representation learning, human motion synthesis, and sign languages. Why it matters: Research in this area could enable more intuitive human-computer interaction and new applications in areas like virtual reality and robotics.