This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.
MBZUAI's Institute of Foundation Models (IFM) has released K2 Think V2, a 70 billion parameter open-source general reasoning model built on K2 V2 Instruct. The model excels in complex reasoning benchmarks like AIME2025 and GPQA-Diamond, and features a low hallucination rate with long context reasoning capabilities. K2 Think V2 is fully sovereign and open, from pre-training through post-training, using IFM-curated data and a Guru dataset. Why it matters: This release contributes to closing the gap between community-owned reproducible AI and proprietary models, particularly in reasoning and long-context understanding for Arabic NLP tasks.
MBZUAI researchers at the Institute of Foundation Models (IFM) investigated the role of reinforcement learning (RL) in improving reasoning abilities of language models. Their study found that RL acts as an 'elicitor' for reasoning in domains frequently encountered during pre-training (e.g., math, coding), while genuinely teaching new reasoning skills in underrepresented domains (e.g., logic, simulations). To support their analysis, they created a new dataset called GURU containing 92,000 examples across six domains. Why it matters: This research clarifies the impact of reinforcement learning on language model reasoning, paving the way for developing models with more generalizable reasoning abilities across diverse domains, an important direction for more capable AI systems.
Liangming Pan from UCSB presented research on building reliable generative AI agents by integrating symbolic representations with LLMs. The neuro-symbolic strategy combines the flexibility of language models with precise knowledge representation and verifiable reasoning. The work covers Logic-LM, ProgramFC, and learning from automated feedback, aiming to address LLM limitations in complex reasoning tasks. Why it matters: Improving the reliability of LLMs is crucial for high-stakes applications in finance, medicine, and law within the region and globally.
A new approach to composed video retrieval (CoVR) is presented, which leverages large multimodal models to infer causal and temporal consequences implied by an edit. The method aligns reasoned queries to candidate videos without task-specific finetuning. A new benchmark, CoVR-Reason, is introduced to evaluate reasoning in CoVR.
Niket Tandon from the Allen Institute for AI presented a talk at MBZUAI on enabling large language models to focus on human needs and continuously learn from interactions. He proposed a memory architecture inspired by the theory of recursive reminding to guide models in avoiding past errors. The talk addressed who to ask, what to ask, when to ask and how to apply the obtained guidance. Why it matters: The research explores how to align LLMs with human feedback, a key challenge for practical and ethical AI deployment.
MBZUAI researchers have developed K2 Think, an open-source AI reasoning system for interpretable energy decisions. K2 Think uses long chain-of-thought supervised fine-tuning and reinforcement learning to improve accuracy on multi-step reasoning in complex energy problems. The system breaks down challenges into smaller, auditable steps and uses test-time scaling for real-time adaptation. Why it matters: The open-source nature of K2 Think promotes transparency, trust, and compliance in critical energy environments while allowing secure deployment on sovereign infrastructure.