Mausam, head of Yardi School of AI at IIT Delhi and affiliate professor at University of Washington, will discuss Neuro-Symbolic AI. The talk will cover recent research threads with applications in NLP, probabilistic decision-making, and constraint satisfaction. Mausam's research explores neuro-symbolic machine learning, computer vision for radiology, NLP for robotics, multilingual NLP, and intelligent information systems. Why it matters: Neuro-Symbolic AI is gaining importance as it combines the strengths of neural and symbolic approaches, potentially leading to more robust and explainable AI systems.
Liangming Pan from UCSB presented research on building reliable generative AI agents by integrating symbolic representations with LLMs. The neuro-symbolic strategy combines the flexibility of language models with precise knowledge representation and verifiable reasoning. The work covers Logic-LM, ProgramFC, and learning from automated feedback, aiming to address LLM limitations in complex reasoning tasks. Why it matters: Improving the reliability of LLMs is crucial for high-stakes applications in finance, medicine, and law within the region and globally.
The paper introduces Duet, a hybrid neural relation understanding method for cardinality estimation. Duet addresses limitations of existing learned methods, such as high costs and scalability issues, by incorporating predicate information into an autoregressive model. Experiments demonstrate Duet's efficiency, accuracy, and scalability, even outperforming GPU-based methods on CPU.
Krishna Murthy, a postdoc at MIT, researches computational world models to enable robots to understand and operate effectively in the physical world. His work focuses on differentiable computing approaches for spatial perception and interfaces large image, language, and audio models with 3D scenes. Murthy envisions structured world models working with scaling-based approaches to create versatile robot perception and planning algorithms. Why it matters: This research could significantly advance robotics by enabling more sophisticated perception, reasoning, and action capabilities in embodied agents.
MBZUAI introduces Agent-X, a benchmark for evaluating multi-step reasoning in vision-centric agents across real-world, multimodal settings. Agent-X includes 828 tasks with diverse visual contexts and spans six environments, requiring tool use and stepwise decision-making. Experiments show that current LLMs struggle with multi-step vision tasks, achieving less than 50% success, highlighting areas for improvement in LMM reasoning and tool use.