Scott Alexander examines the Eliciting Latent Knowledge (ELK) problem in AI alignment and various proposed solutions.
Longer summary
Scott Alexander discusses the Eliciting Latent Knowledge (ELK) problem in AI alignment, which involves training an AI to truthfully report what it knows. He explains the challenges of distinguishing between an AI that genuinely tells the truth and one that simply tells humans what they want to hear. The post covers various strategies proposed by the Alignment Research Center (ARC) to solve this problem, including training on scenarios where humans are fooled, using complexity penalties, and testing the AI with different types of predictors. Scott also mentions the ELK prize contest and some criticisms of the approach from other AI safety researchers.
Shorter summary