This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
AI Control
•
Applied to
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
by
Henry Cai
1d
ago
•
Applied to
AI Safety Strategies Landscape
by
Charbel-Raphaël
1mo
ago
•
Applied to
AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
by
DanielFilan
2mo
ago
•
Applied to
How useful is "AI Control" as a framing on AI X-Risk?
by
elifland
3mo
ago
•
Applied to
How to safely use an optimizer
by
Simon Fischer
3mo
ago
•
Applied to
Protocol evaluations: good analogies vs control
by
Charbel-Raphaël
4mo
ago
•
Applied to
Auditing LMs with counterfactual search: a tool for control and ELK
by
Jacob Pfau
4mo
ago
•
Applied to
Critiques of the AI control agenda
by
Jozdien
4mo
ago
•
Created by
Charbel-Raphaël
at
5mo