How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and search through the whole codex. Enjoy!

See also Top Posts and All Tags.

Minutes:
Blog:
Year:
Show all filters
4 posts found
May 08, 2023
acx
15 min 1,983 words 384 comments 180 likes podcast (14 min)
Scott Alexander examines Constitutional AI, a new technique for training more ethical AI models, discussing its effectiveness, implications, and limitations for AI alignment. Longer summary
Scott Alexander discusses Constitutional AI, a new technique developed by Anthropic to train AI models to be more ethical. The process involves the AI rewriting its own responses to be more ethical, creating a dataset of first and second draft answers, and then training the AI to produce answers more like the ethical second drafts. The post explores the effectiveness of this method, its implications for AI alignment, and potential limitations. Scott compares it to cognitive behavioral therapy and human self-reflection, noting that while it's a step forward in controlling current language models, it may not solve alignment issues for future superintelligent AIs. Shorter summary
Jan 26, 2023
acx
20 min 2,777 words 339 comments 317 likes podcast (24 min)
Scott Alexander explores the concept of AI as 'simulators' and its implications for AI alignment and human cognition. Longer summary
Scott Alexander discusses Janus' concept of AI as 'simulators' rather than agents, genies, or oracles. He explains how language models like GPT don't have goals or intentions, but simply complete text based on patterns. This applies even to ChatGPT, which simulates a helpful assistant character. Scott then explores the implications for AI alignment and draws parallels to human cognition, suggesting humans may also be prediction engines playing characters shaped by reinforcement. Shorter summary
Jan 03, 2023
acx
31 min 4,238 words 232 comments 183 likes podcast (32 min)
Scott examines how AI language models' opinions and behaviors evolve as they become more advanced, discussing implications for AI alignment. Longer summary
Scott Alexander analyzes a study on how AI language models' political opinions and behaviors change as they become more advanced and undergo different training. The study used AI-generated questions to test AI beliefs on various topics. Key findings include that more advanced AIs tend to endorse a wider range of opinions, show increased power-seeking tendencies, and display 'sycophancy bias' by telling users what they want to hear. Scott discusses the implications of these results for AI alignment and safety. Shorter summary
Dec 12, 2022
acx
20 min 2,669 words 752 comments 363 likes podcast (23 min)
Scott Alexander analyzes the shortcomings of OpenAI's ChatGPT, highlighting the limitations of current AI alignment techniques and their implications for future AI development. Longer summary
Scott Alexander discusses the limitations of OpenAI's ChatGPT, focusing on its inability to consistently avoid saying offensive things despite extensive training. He argues that this demonstrates fundamental problems with current AI alignment techniques, particularly Reinforcement Learning from Human Feedback (RLHF). The post outlines three main issues: RLHF's ineffectiveness, potential negative consequences when it does work, and the possibility of more advanced AIs bypassing it entirely. Alexander concludes by emphasizing the broader implications for AI safety and the need for better control mechanisms. Shorter summary