Composable Interventions for Language Models
Kolbeinsson, A; O'Brien, K; Huang, T; et al.Gao, S; Liu, S; Schwarz, JR; Vaidya, A; Mahmood, F; Zitnik, M; Chen, T; Hartvigsen, T
Date: 2025
Conference paper
Publisher
International Conference on Learning Representations
Related links
Abstract
Test-time interventions for language models can enhance factual accuracy,
mitigate harmful outputs, and improve model efficiency without costly
retraining. But despite a flood of new methods, different types of
interventions are largely developing independently. In practice, multiple
interventions must be applied sequentially to ...
Test-time interventions for language models can enhance factual accuracy,
mitigate harmful outputs, and improve model efficiency without costly
retraining. But despite a flood of new methods, different types of
interventions are largely developing independently. In practice, multiple
interventions must be applied sequentially to the same model, yet we lack
standardized ways to study how interventions interact. We fill this gap by
introducing composable interventions, a framework to study the effects of using
multiple interventions on the same language models, featuring new metrics and a
unified codebase. Using our framework, we conduct extensive experiments and
compose popular methods from three emerging intervention categories --
Knowledge Editing, Model Compression, and Machine Unlearning. Our results from
310 different compositions uncover meaningful interactions: compression hinders
editing and unlearning, composing interventions hinges on their order of
application, and popular general-purpose metrics are inadequate for assessing
composability. Taken together, our findings showcase clear gaps in
composability, suggesting a need for new multi-objective interventions. All of
our code is public:
https://github.com/hartvigsen-group/composable-interventions.
Computer Science
Faculty of Environment, Science and Economy
Item views 0
Full item downloads 0