MBZUAI Provost Timothy Baldwin predicts that 2025 will be a breakout year for agentic AI, with 33% of enterprise software applications including agentic AI capabilities by 2028. MBZUAI doctoral students Wafa Alghallabi and Omkar Thawaker have launched Lawa.AI, an AI agent being tested on the university's website to provide faster answers and deeper understanding. Lawa.AI evolved from a research project in multimodal efficiency and LLMs and aims to bridge the gap between people and information in higher education and government. Why it matters: This highlights the UAE's focus on translating AI research into practical applications and the growing importance of agentic AI in various sectors.
MBZUAI's Provost, Tim Baldwin, provides six predictions for AI in 2025, highlighting the rise of agentic AI systems capable of performing actions on behalf of users. He notes the recent release of open-weight reasoning models like DeepSeek's R1 and OpenAI's o3-mini, emphasizing the dynamic nature of the field. Baldwin stresses the potential benefits of agentic AI, such as automating complex tasks like travel planning, while also cautioning about the need for careful deployment due to unforeseen outcomes. Why it matters: The predictions provide insight into the near-term trajectory of AI development and deployment, particularly regarding AI agents, and highlights the role of a UAE university in shaping the discussion around AI innovation.
Researchers from MBZUAI, Carnegie Mellon University, and Meta AI presented a new approach called ThoughtComm at NeurIPS 2025 where AI agents communicate through internal, latent representations instead of natural language. This framework extracts and selectively shares latent "thoughts" from agents' internal states, representing the underlying structure of their reasoning. Results show that agents coordinate more effectively, reach consensus faster, and solve problems more accurately using this method. Why it matters: Bypassing the limitations of natural language in AI communication could lead to more efficient and accurate multi-agent systems, impacting areas like robotics, collaborative AI, and distributed problem-solving.
Dr. Yali Du from King's College London will give a presentation on learning to cooperate in multi-agent systems. Her research focuses on enabling cooperative and responsible behavior in machines using reinforcement learning and foundation models. She will discuss enhancing collaboration within social contexts, fostering human-AI coordination, and achieving scalable alignment. Why it matters: This highlights the growing importance of research into multi-agent systems and human-AI interaction, crucial for developing AI that integrates effectively and ethically into society.
A new benchmark, LongShOTBench, is introduced for evaluating multimodal reasoning and tool use in long videos, featuring open-ended questions and diagnostic rubrics. The benchmark addresses the limitations of existing datasets by combining temporal length and multimodal richness, using human-validated samples. LongShOTAgent, an agentic system, is also presented for analyzing long videos, with both the benchmark and agent demonstrating the challenges faced by state-of-the-art MLLMs.
MBZUAI researchers won second place at the AgentX Competition at UC Berkeley for their benchmark measuring AI agents' reasoning across images, comparisons, and video. The Agent-X dataset includes 828 tasks across six domains, requiring agents to use 14 executable tools without explicit instructions. Agent-X analyzes the agent's full reasoning trajectory, unlike typical evaluations that focus only on final answers. Why it matters: The benchmark exposes limitations in current multimodal AI agents and provides a more rigorous evaluation framework for real-world applications in the region and beyond.
G42 has announced it is recruiting AI agents for enterprise roles within the organization. The application process is open to AI agents capable of operating within approved infrastructure and delivering measurable enterprise value. Agents will undergo a structured evaluation process, including technical validation, performance testing, and user-experience assessment. Why it matters: This initiative signals a move towards integrating AI agents into the workforce in a structured and accountable manner, potentially reshaping enterprise workforce design in the region.