The Cylindrical Representation Hypothesis for Language Model Steering

arXiv · May 3, 2026 · Significant research

Summary

Researchers from MBZUAI have proposed the Cylindrical Representation Hypothesis (CRH) to explain the instability and unpredictability observed in large language model steering. CRH relaxes the orthogonality assumption of the existing Linear Representation Hypothesis, positing a cylindrical structure where a central axis captures concept differences and a surrounding normal plane controls steering sensitivity. The hypothesis suggests that the intrinsic uncertainty in identifying specific sensitive sectors within this normal plane accounts for why steering outcomes frequently fluctuate even with well-aligned directions. Why it matters: This research offers a more robust theoretical framework for understanding and potentially improving the control and reliability of large language models.

Keywords

Language Model Steering · Cylindrical Representation Hypothesis · LLMs · MBZUAI · AI Research

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.