Search

Results for "large reasoning models"

Empowering Large Language Models with Reliable Reasoning

MBZUAI · Invalid Date

Liangming Pan from UCSB presented research on building reliable generative AI agents by integrating symbolic representations with LLMs. The neuro-symbolic strategy combines the flexibility of language models with precise knowledge representation and verifiable reasoning. The work covers Logic-LM, ProgramFC, and learning from automated feedback, aiming to address LLM limitations in complex reasoning tasks. Why it matters: Improving the reliability of LLMs is crucial for high-stakes applications in finance, medicine, and law within the region and globally.

CoVR-R:Reason-Aware Composed Video Retrieval

arXiv · Mar 20

A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.

Reasoning with interactive guidance

MBZUAI · Invalid Date

Niket Tandon from the Allen Institute for AI presented a talk at MBZUAI on enabling large language models to focus on human needs and continuously learn from interactions. He proposed a memory architecture inspired by the theory of recursive reminding to guide models in avoiding past errors. The talk addressed who to ask, what to ask, when to ask and how to apply the obtained guidance. Why it matters: The research explores how to align LLMs with human feedback, a key challenge for practical and ethical AI deployment.

SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models

arXiv · May 29

MBZUAI researchers introduce SocialMaze, a new benchmark for evaluating social reasoning capabilities in large language models (LLMs). SocialMaze includes six diverse tasks across social reasoning games, daily-life interactions, and digital community platforms, emphasizing deep reasoning, dynamic interaction, and information uncertainty. Experiments show that LLMs vary in handling dynamic interactions, degrade under uncertainty, but can be improved via fine-tuning on curated reasoning examples.

K2 Think V2: a fully sovereign reasoning model

MBZUAI · Invalid Date

MBZUAI's Institute of Foundation Models (IFM) has released K2 Think V2, a 70 billion parameter open-source general reasoning model built on K2 V2 Instruct. The model excels in complex reasoning benchmarks like AIME2025 and GPQA-Diamond, and features a low hallucination rate with long context reasoning capabilities. K2 Think V2 is fully sovereign and open, from pre-training through post-training, using IFM-curated data and a Guru dataset. Why it matters: This release contributes to closing the gap between community-owned reproducible AI and proprietary models, particularly in reasoning and long-context understanding for Arabic NLP tasks.