How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and search through the whole codex. Enjoy!

See also Top Posts and All Tags.

Minutes:
Blog:
Year:
Show all filters
4 posts found
Nov 28, 2022
acx
38 min 5,189 words 450 comments 107 likes podcast (39 min)
Scott Alexander examines Redwood Research's attempt to create an AI that avoids generating violent content, using Alex Rider fanfiction as training data. Longer summary
Scott Alexander reviews Redwood Research's project to create an AI that can classify and avoid violent content in text completions, using Alex Rider fanfiction as training data. The project aimed to test whether AI alignment through reinforcement learning could work, but ultimately failed to create an unbeatable violence classifier. The article explores the challenges faced, the methods used, and the implications for broader AI alignment efforts. Shorter summary
Jul 26, 2022
acx
47 min 6,446 words 298 comments 107 likes podcast (42 min)
Scott Alexander examines the Eliciting Latent Knowledge (ELK) problem in AI alignment and various proposed solutions. Longer summary
Scott Alexander discusses the Eliciting Latent Knowledge (ELK) problem in AI alignment, which involves training an AI to truthfully report what it knows. He explains the challenges of distinguishing between an AI that genuinely tells the truth and one that simply tells humans what they want to hear. The post covers various strategies proposed by the Alignment Research Center (ARC) to solve this problem, including training on scenarios where humans are fooled, using complexity penalties, and testing the AI with different types of predictors. Scott also mentions the ELK prize contest and some criticisms of the approach from other AI safety researchers. Shorter summary
Feb 11, 2022
acx
25 min 3,475 words 75 comments 34 likes podcast (24 min)
Scott Alexander explores expert and reader comments on his post about motivated reasoning and reinforcement learning, discussing brain function, threat detection, and the implementation of complex behaviors. Longer summary
Scott Alexander discusses comments on his post about motivated reasoning and reinforcement learning. The post covers expert opinions on brain function and reinforcement learning, arguments about long-term rewards of threat detection, discussions on practical reasons for motivated reasoning, and miscellaneous thoughts on the topic. Key points include debates on how the brain processes information, the role of Bayesian reasoning, and the challenges of implementing complex behaviors through genetic encoding. Scott also reflects on his own experiences and the limitations of reinforcement learning models in explaining human behavior. Shorter summary
Feb 01, 2022
acx
6 min 729 words 335 comments 122 likes podcast (7 min)
Scott analyzes motivated reasoning as misapplied reinforcement learning, explaining how it might arise from the brain's mixture of reinforceable and non-reinforceable architectures. Longer summary
Scott explores the concept of motivated reasoning as misapplied reinforcement learning in the brain. He contrasts behavioral brain regions that benefit from hedonic reinforcement learning with epistemic regions where such learning would be detrimental. The post discusses how this distinction might explain phenomena like 'ugh fields' and motivated reasoning, especially in novel situations like taxes or politics where brain networks might be placed on a mix of reinforceable and non-reinforceable architectures. Scott suggests this model could explain why people often confuse what is true with what they want to be true. Shorter summary