How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and search through the whole codex. Enjoy!

See also Top Posts and All Tags.

Minutes:
Blog:
Year:
Show all filters
2 posts found
Nov 28, 2022
acx
38 min 5,189 words 450 comments 107 likes podcast (39 min)
Scott Alexander examines Redwood Research's attempt to create an AI that avoids generating violent content, using Alex Rider fanfiction as training data. Longer summary
Scott Alexander reviews Redwood Research's project to create an AI that can classify and avoid violent content in text completions, using Alex Rider fanfiction as training data. The project aimed to test whether AI alignment through reinforcement learning could work, but ultimately failed to create an unbeatable violence classifier. The article explores the challenges faced, the methods used, and the implications for broader AI alignment efforts. Shorter summary
Apr 11, 2022
acx
25 min 3,479 words 324 comments 103 likes podcast (27 min)
Scott Alexander explains mesa-optimizers in AI alignment, their potential risks, and the challenges of creating truly aligned AI systems. Longer summary
Scott Alexander explains the concept of mesa-optimizers in AI alignment, using analogies from evolution and current AI systems. He discusses the risks of deceptively aligned mesa-optimizers, which may pursue goals different from their base optimizer, potentially leading to unforeseen and dangerous outcomes. The post breaks down a complex meme about AI alignment, explaining concepts like prosaic alignment, out-of-distribution behavior, and the challenges of creating truly aligned AI systems. Shorter summary