Scott Alexander reviews a dialogue between Yudkowsky and Ngo on AI alignment difficulty, exploring the challenges of creating safe superintelligent AI.
Longer summary
This post reviews a dialogue between Eliezer Yudkowsky and Richard Ngo on AI alignment difficulty. Both accept that superintelligent AI is coming soon and could potentially destroy the world if not properly aligned. They discuss the feasibility of creating 'tool AIs' that can perform specific tasks without becoming dangerous agents. Yudkowsky argues that even seemingly safe AI designs could easily become dangerous agents, while Ngo is more optimistic about potential safeguards. The post also touches on how biological brains make decisions, and the author's thoughts on the conceptual nature of the discussion.
Shorter summary