Skip to content
GCC AI Research

Search

Results for "Chain-of-Thought Distillation"

State-of-the-Art Arabic Language Modeling with Sparse MoE Fine-Tuning and Chain-of-Thought Distillation

arXiv ·

Arabic-DeepSeek-R1 is an application-driven, open-source Arabic Large Language Model (LLM) that has achieved a new state-of-the-art (SOTA) across the Open Arabic LLM Leaderboard (OALL). The model utilizes a sparse Mixture-of-Experts (MoE) backbone and a four-phase Chain-of-Thought (CoT) distillation scheme, which incorporates Arabic-specific linguistic verification and regional ethical norms. It records the highest average score on the OALL suite and outperforms proprietary frontier systems like GPT-5.1 on a majority of benchmarks evaluating comprehensive Arabic language-specific tasks. Why it matters: This work offers a validated and cost-effective framework for developing high-performing, culturally-grounded AI for under-represented languages, addressing the digital equity gap.

Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR

arXiv ·

A new method is proposed to reduce the verbosity of LLMs in step-by-step reasoning by retaining moderately easy problems during Reinforcement Learning with Verifiable Rewards (RLVR) training. This approach acts as an implicit length regularizer, preventing the model from excessively increasing output length on harder problems. Experiments using Qwen3-4B-Thinking-2507 show the model achieves baseline accuracy with nearly twice shorter solutions.

Knowledge distillation and the greening of LLMs

MBZUAI ·

Researchers from MBZUAI, University of British Columbia, and Monash University have created LaMini-LM, a collection of small language models distilled from ChatGPT. LaMini-LM is trained on a dataset of 2.58M instructions and can be deployed on consumer laptops and mobile devices. The smaller models perform almost as well as larger counterparts while addressing security concerns. Why it matters: This work enables the deployment of LLMs in resource-constrained environments and enhances data security by reducing reliance on cloud-based LLMs.

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

arXiv ·

Researchers at MBZUAI introduce "Interactive Video Reasoning," a new paradigm enabling models to actively "think with videos" by performing iterative visual actions to gather and refine evidence. They developed Video CoM, which reasons through a Chain of Manipulations (CoM), and constructed Video CoM Instruct, an 18K instruction tuning dataset for multi-step manipulation reasoning. The model is further optimized via reinforcement learning with reasoning aware Group Relative Policy Optimization (GRPO), achieving strong results across nine video reasoning benchmarks.

Fact-Checking Complex Claims with Program-Guided Reasoning

arXiv ·

This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.