Jayneel Parekh - Learning to Steer: Input-dependent Steering for Multimodal LLMs
- Date: 22 septembre 2025 à 13h
- Salle: SCAI
Steering has emerged as a practical approach to enable post-hoc
guidance of LLMs towards enforcing a specific behavior. However, it remains
largely underexplored for multimodal LLMs (MLLMs); furthermore, existing
steering techniques, such as mean steering, rely on a single steering vector,
applied independently of the input query. This paradigm faces limitations when
the desired behavior is dependent on the example at hand. For example, a safe
answer may consist in abstaining from answering when asked for an illegal
activity, or may point to external resources or consultation with an expert
when asked about medical advice. In this paper, we investigate a fine-grained
steering that uses an input-specific linear shift. This shift is computed
using contrastive input-specific prompting. However, the input specific
prompts required for this approach are not known at test time. Therefore, we
propose to train a small auxiliary module to predict the input-specific
steering vector. Our approach, dubbed as L2S (Learn-to-Steer), demonstrates
that it reduces hallucinations and enforces safety in MLLMs, outperforming
other static baselines.