Middle East AI

This Week arXiv

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

arXiv · · Significant research

Summary

This paper investigates the intrinsic self-correction capabilities of LLMs, identifying model confidence as a key latent factor. Researchers developed an "If-or-Else" (IoE) prompting framework to guide LLMs in assessing their own confidence and improving self-correction accuracy. Experiments demonstrate that the IoE-based prompt enhances the accuracy of self-corrected responses, with code available on GitHub.

Keywords

LLM · self-correction · confidence · prompting · IoE framework

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv ·

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models

arXiv ·

The paper introduces LLMEffiChecker, a tool to test the computational efficiency robustness of LLMs by identifying vulnerabilities that can significantly degrade performance. LLMEffiChecker uses both white-box (gradient-guided perturbation) and black-box (causal inference-based perturbation) methods to delay the generation of the end-of-sequence token. Experiments on nine public LLMs demonstrate that LLMEffiChecker can substantially increase response latency and energy consumption with minimal input perturbations.

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

arXiv ·

A new survey paper provides a deep dive into post-training methodologies for Large Language Models (LLMs), analyzing their role in refining LLMs beyond pretraining. It addresses key challenges such as catastrophic forgetting, reward hacking, and inference-time trade-offs, and highlights emerging directions in model alignment, scalable adaptation, and inference-time reasoning. The paper also provides a public repository to continually track developments in this fast-evolving field.

UnsafeChain: Enhancing Reasoning Model Safety via Hard Cases

arXiv ·

Researchers introduce UnsafeChain, a new safety alignment dataset designed to improve the safety of large reasoning models (LRMs) by focusing on 'hard prompts' that elicit harmful outputs. The dataset identifies and corrects unsafe completions into safe responses, exposing models to unsafe behaviors and guiding their correction. Fine-tuning LRMs on UnsafeChain demonstrates enhanced safety and preservation of general reasoning ability compared to existing datasets like SafeChain and STAR-1.