dc.contributor.author | Kolbeinsson, A | |
dc.contributor.author | O'Brien, K | |
dc.contributor.author | Huang, T | |
dc.contributor.author | Gao, S | |
dc.contributor.author | Liu, S | |
dc.contributor.author | Schwarz, JR | |
dc.contributor.author | Vaidya, A | |
dc.contributor.author | Mahmood, F | |
dc.contributor.author | Zitnik, M | |
dc.contributor.author | Chen, T | |
dc.contributor.author | Hartvigsen, T | |
dc.date.accessioned | 2025-03-11T15:23:36Z | |
dc.date.issued | 2025 | |
dc.date.updated | 2025-02-28T20:41:31Z | |
dc.description.abstract | Test-time interventions for language models can enhance factual accuracy,
mitigate harmful outputs, and improve model efficiency without costly
retraining. But despite a flood of new methods, different types of
interventions are largely developing independently. In practice, multiple
interventions must be applied sequentially to the same model, yet we lack
standardized ways to study how interventions interact. We fill this gap by
introducing composable interventions, a framework to study the effects of using
multiple interventions on the same language models, featuring new metrics and a
unified codebase. Using our framework, we conduct extensive experiments and
compose popular methods from three emerging intervention categories --
Knowledge Editing, Model Compression, and Machine Unlearning. Our results from
310 different compositions uncover meaningful interactions: compression hinders
editing and unlearning, composing interventions hinges on their order of
application, and popular general-purpose metrics are inadequate for assessing
composability. Taken together, our findings showcase clear gaps in
composability, suggesting a need for new multi-objective interventions. All of
our code is public:
https://github.com/hartvigsen-group/composable-interventions. | en_GB |
dc.identifier.citation | ICLR 2025 - The Thirteenth International Conference on Learning Representations, 24 - 28 April 2025, Singapore. Awaiting full citation and link | en_GB |
dc.identifier.uri | http://hdl.handle.net/10871/140593 | |
dc.identifier | ORCID: 0000-0002-7740-8843 (Huang, Tianjin) | |
dc.language.iso | en | en_GB |
dc.publisher | International Conference on Learning Representations | en_GB |
dc.relation.url | https://iclr.cc/Conferences/2025 | en_GB |
dc.relation.url | https://iclr.cc/virtual/2025/papers.html | en_GB |
dc.relation.url | https://iclr.cc/virtual/2025/poster/28014 | en_GB |
dc.rights.embargoreason | Under embargo until close of conference | en_GB |
dc.rights | © 2025 The author(s) | en_GB |
dc.title | Composable Interventions for Language Models | en_GB |
dc.type | Conference paper | en_GB |
dc.date.available | 2025-03-11T15:23:36Z | |
dc.description | This is the final version. | en_GB |
dc.rights.uri | http://www.rioxx.net/licenses/all-rights-reserved | en_GB |
rioxxterms.version | VoR | en_GB |
rioxxterms.licenseref.startdate | 2025-03-11 | |
rioxxterms.type | Conference Paper/Proceeding/Abstract | en_GB |
refterms.dateFCD | 2025-03-11T15:20:59Z | |
refterms.versionFCD | VoR | |
refterms.panel | B | en_GB |