Skip to content
GCC AI Research

Search

Results for "linguistic reasoning"

Empowering Large Language Models with Reliable Reasoning

MBZUAI ·

Liangming Pan from UCSB presented research on building reliable generative AI agents by integrating symbolic representations with LLMs. The neuro-symbolic strategy combines the flexibility of language models with precise knowledge representation and verifiable reasoning. The work covers Logic-LM, ProgramFC, and learning from automated feedback, aiming to address LLM limitations in complex reasoning tasks. Why it matters: Improving the reliability of LLMs is crucial for high-stakes applications in finance, medicine, and law within the region and globally.

Can LLMs reason? New benchmark puts models to the test

MBZUAI ·

MBZUAI researchers created a new benchmark dataset called TextGames to evaluate the reasoning abilities of LLMs. The dataset uses simple, text-based games requiring skills like pattern recognition and logical thinking. LLMs struggled with the hardest questions, suggesting limitations in their reasoning capabilities despite advancements in language understanding. Why it matters: This research highlights the need for specialized reasoning models and benchmarks that go beyond memorization to truly test AI's problem-solving abilities.

What reinforcement learning can teach language models about reasoning

MBZUAI ·

MBZUAI researchers at the Institute of Foundation Models (IFM) investigated the role of reinforcement learning (RL) in improving reasoning abilities of language models. Their study found that RL acts as an 'elicitor' for reasoning in domains frequently encountered during pre-training (e.g., math, coding), while genuinely teaching new reasoning skills in underrepresented domains (e.g., logic, simulations). To support their analysis, they created a new dataset called GURU containing 92,000 examples across six domains. Why it matters: This research clarifies the impact of reinforcement learning on language model reasoning, paving the way for developing models with more generalizable reasoning abilities across diverse domains, an important direction for more capable AI systems.

ALPS: A Diagnostic Challenge Set for Arabic Linguistic & Pragmatic Reasoning

arXiv ·

The paper introduces ALPS (Arabic Linguistic & Pragmatic Suite), a diagnostic challenge set for evaluating deep semantics and pragmatics in Arabic NLP. The dataset contains 531 expert-curated questions across 15 tasks and 47 subtasks, designed to test morpho-syntactic dependencies and compositional semantics. Evaluation of 23 models, including commercial, open-source, and Arabic-native models, reveals that models struggle with fundamental morpho-syntactic dependencies, especially those reliant on diacritics. Why it matters: ALPS provides a valuable benchmark for evaluating the linguistic competence of Arabic NLP models, highlighting areas where current models fall short despite achieving high fluency.

Neural Models with Symbolic Representations for Perceptuo-Reasoning Tasks

MBZUAI ·

Mausam, head of Yardi School of AI at IIT Delhi and affiliate professor at University of Washington, will discuss Neuro-Symbolic AI. The talk will cover recent research threads with applications in NLP, probabilistic decision-making, and constraint satisfaction. Mausam's research explores neuro-symbolic machine learning, computer vision for radiology, NLP for robotics, multilingual NLP, and intelligent information systems. Why it matters: Neuro-Symbolic AI is gaining importance as it combines the strengths of neural and symbolic approaches, potentially leading to more robust and explainable AI systems.

AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP

arXiv ·

This paper benchmarks reasoning-focused LLMs, especially DeepSeek models, on fifteen Arabic NLP tasks. The study uses zero-shot, few-shot, and fine-tuning strategies. Key findings include that three in-context examples improve F1 scores by over 13 points on classification tasks, DeepSeek outperforms GPT-4-mini by 12 F1 points on complex inference tasks in the zero-shot setting, and LoRA fine-tuning yields up to an additional 8 points in F1 and BLEU. Why it matters: The systematic evaluation provides insights into the performance of LLMs on Arabic NLP, highlighting the effectiveness of different strategies for improving performance and contributing to the development of more capable Arabic language models.

Fact-Checking Complex Claims with Program-Guided Reasoning

arXiv ·

This paper introduces ProgramFC, a fact-checking model that decomposes complex claims into simpler sub-tasks using a library of functions. The model uses LLMs to generate reasoning programs and executes them by delegating sub-tasks, enhancing explainability and data efficiency. Experiments on fact-checking datasets demonstrate ProgramFC's superior performance compared to baseline methods, with publicly available code and data.