LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models

arXiv · October 7, 2022 · Significant research

Summary

The paper introduces LLMEffiChecker, a tool to test the computational efficiency robustness of LLMs by identifying vulnerabilities that can significantly degrade performance. LLMEffiChecker uses both white-box (gradient-guided perturbation) and black-box (causal inference-based perturbation) methods to delay the generation of the end-of-sequence token. Experiments on nine public LLMs demonstrate that LLMEffiChecker can substantially increase response latency and energy consumption with minimal input perturbations.

Keywords

LLM · efficiency · robustness · perturbation · energy consumption

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

arXiv · Feb 1

Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models

arXiv · Feb 19

This paper investigates the intrinsic self-correction capabilities of LLMs, identifying model confidence as a key latent factor. Researchers developed an "If-or-Else" (IoE) prompting framework to guide LLMs in assessing their own confidence and improving self-correction accuracy. Experiments demonstrate that the IoE-based prompt enhances the accuracy of self-corrected responses, with code available on GitHub.

LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models

Summary

Keywords

Related

When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

Confidence Matters: Revisiting Intrinsic Self-Correction Capabilities of Large Language Models