Middle East AI

Topics

Domain Adaptation

1 article RSS ↗

YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

arXiv · · LLM RL

The paper introduces Yet another Policy Optimization (YaPO), a reference-free method for learning sparse steering vectors in the latent space of a Sparse Autoencoder (SAE) to steer LLMs. By optimizing sparse codes, YaPO produces disentangled, interpretable, and efficient steering directions. Experiments show YaPO converges faster, achieves stronger performance, exhibits improved training stability and preserves general knowledge compared to dense steering baselines.