LESSWRONGTags
LW

AI Control

•

Applied to Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller by Henry Cai 1d ago

•

Applied to AI Safety Strategies Landscape by Charbel-Raphaël 1mo ago

•

Applied to AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt by DanielFilan 2mo ago

•

Applied to How useful is "AI Control" as a framing on AI X-Risk? by elifland 3mo ago

•

Applied to How to safely use an optimizer by Simon Fischer 3mo ago

•

Applied to Protocol evaluations: good analogies vs control by Charbel-Raphaël 4mo ago

•

Applied to Auditing LMs with counterfactual search: a tool for control and ELK by Jacob Pfau 4mo ago

•

Applied to Critiques of the AI control agenda by Jozdien 4mo ago

•

Created by Charbel-Raphaël at 5mo