How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and search through the whole codex. Enjoy!

See also Top Posts and All Tags.

Minutes:
Blog:
Year:
Show all filters
2 posts found
May 23, 2022
acx
7 min 939 words 194 comments 74 likes podcast (8 min)
Scott Alexander explores parallels between human willpower and potential AI development, suggesting future AIs might experience weakness of will similar to humans. Longer summary
Scott Alexander explores the concept of willpower in humans and AI, drawing parallels between evolutionary drives and AI training. He suggests that both humans and future AIs might experience a struggle between instinctual drives and higher-level planning modules. The post discusses how evolution has instilled basic drives in animals, which then developed their own ways to satisfy these drives. Similarly, AI training might first develop 'instinctual' responses before evolving more complex planning abilities. Scott posits that this could lead to AIs experiencing weakness of will, contradicting the common narrative of hyper-focused AIs in discussions of AI risk. He also touches on the nature of consciousness and agency, questioning whether the 'I' of willpower is the same as the 'I' of conscious access. Shorter summary
Apr 11, 2022
acx
25 min 3,479 words 324 comments 103 likes podcast (27 min)
Scott Alexander explains mesa-optimizers in AI alignment, their potential risks, and the challenges of creating truly aligned AI systems. Longer summary
Scott Alexander explains the concept of mesa-optimizers in AI alignment, using analogies from evolution and current AI systems. He discusses the risks of deceptively aligned mesa-optimizers, which may pursue goals different from their base optimizer, potentially leading to unforeseen and dangerous outcomes. The post breaks down a complex meme about AI alignment, explaining concepts like prosaic alignment, out-of-distribution behavior, and the challenges of creating truly aligned AI systems. Shorter summary