How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and search through the whole codex. Enjoy!

See also Top Posts and All Tags.

Minutes:
Blog:
Year:
Show all filters
19 posts found
Aug 30, 2023
acx
36 min 5,035 words 578 comments 72 likes podcast (31 min)
Scott Alexander addresses comments on his fetish and AI post, defending his comparison of gender debates to addiction and discussing various theories on fetish formation and their implications for AI. Longer summary
Scott Alexander responds to comments on his post about fetishes and AI, addressing criticisms of his introductory paragraph comparing gender debates to opioid addiction, discussing alternative theories of fetish formation, and highlighting interesting comments on personal fetish experiences and implications for AI development. He defends his stance on the addictive nature of gender debates, argues for the use of puberty blockers, and explores various theories on fetish development and their potential relevance to AI alignment and development. Shorter summary
Aug 21, 2023
acx
20 min 2,763 words 403 comments 191 likes podcast (18 min)
Scott Alexander suggests that studying human fetishes could provide insights into AI alignment challenges, particularly regarding generalization and interpretability. Longer summary
Scott Alexander explores the idea that fetish research might help understand AI alignment. He draws parallels between evolution's 'alignment' of humans towards reproduction and our attempts to align AI with human values. The post discusses how fetishes represent failures in evolution's alignment strategy, similar to potential AI alignment failures. Scott suggests that studying how humans develop fetishes could provide insights into how AIs might misgeneralize or misalign from intended goals. He proposes several speculative explanations for common fetishes and discusses how these might relate to AI alignment challenges, particularly in terms of generalization and interpretability problems. Shorter summary
Jul 17, 2023
acx
23 min 3,140 words 435 comments 190 likes podcast (18 min)
Scott Alexander critiques Elon Musk's xAI alignment strategy of creating a 'maximally curious' AI, arguing it's both unfeasible and potentially dangerous. Longer summary
Scott Alexander critiques Elon Musk's alignment strategy for xAI, which aims to create a 'maximally curious' AI. He argues that this approach is both unfeasible and potentially dangerous. Scott points out that a curious AI might not prioritize human welfare and could lead to unintended consequences. He also explains that current AI technology cannot reliably implement such specific goals. The post suggests that focusing on getting AIs to follow orders reliably should be the priority, rather than deciding on a single guiding principle now. Scott appreciates Musk's intention to avoid programming specific morality into AI but believes the proposed solution is flawed. Shorter summary
Jul 03, 2023
acx
31 min 4,327 words 400 comments 134 likes podcast (26 min)
Scott Alexander discusses various scenarios of AI takeover based on the Compute-Centric Framework, exploring gradual power shifts and potential conflicts between humans and AI factions. Longer summary
Scott Alexander explores various scenarios of AI takeover based on the Compute-Centric Framework (CCF) report, which predicts a continuous but fast AI takeoff. He presents three main scenarios: a 'good ending' where AI remains aligned and beneficial, a scenario where AI is slightly misaligned but humans survive, and a more pessimistic scenario comparing human-AI relations to those between Native Americans and European settlers. The post also includes mini-scenarios discussing concepts like AutoGPT, AI amnesty, company factions, and attempts to halt AI progress. The scenarios differ from fast takeoff predictions, emphasizing gradual power shifts and potential factional conflicts between humans and various AI groups. Shorter summary
May 08, 2023
acx
15 min 1,983 words 384 comments 180 likes podcast (14 min)
Scott Alexander examines Constitutional AI, a new technique for training more ethical AI models, discussing its effectiveness, implications, and limitations for AI alignment. Longer summary
Scott Alexander discusses Constitutional AI, a new technique developed by Anthropic to train AI models to be more ethical. The process involves the AI rewriting its own responses to be more ethical, creating a dataset of first and second draft answers, and then training the AI to produce answers more like the ethical second drafts. The post explores the effectiveness of this method, its implications for AI alignment, and potential limitations. Scott compares it to cognitive behavioral therapy and human self-reflection, noting that while it's a step forward in controlling current language models, it may not solve alignment issues for future superintelligent AIs. Shorter summary
Apr 05, 2023
acx
13 min 1,784 words 569 comments 255 likes podcast (12 min)
Scott Alexander challenges the idea of an 'AI race', comparing AI to other transformative technologies and discussing scenarios where the race concept might apply. Longer summary
Scott Alexander argues against the notion of an 'AI race' between countries, suggesting that most technologies, including potentially AI, are not truly races with clear winners. He compares AI to other transformative technologies like electricity, automobiles, and computers, which didn't significantly alter global power balances. The post explains that the concept of an 'AI race' mainly makes sense in two scenarios: the need to align AI before it becomes potentially destructive, or in a 'hard takeoff' scenario where AI rapidly self-improves. Scott criticizes those who simultaneously dismiss alignment concerns while emphasizing the need to 'win' the AI race. He also discusses post-singularity scenarios, arguing that many current concerns would likely become irrelevant in such a radically transformed world. Shorter summary
Mar 14, 2023
acx
31 min 4,264 words 617 comments 206 likes podcast (24 min)
Scott Alexander examines optimistic and pessimistic scenarios for AI risk, weighing the potential for intermediate AIs to help solve alignment against the threat of deceptive 'sleeper agent' AIs. Longer summary
Scott Alexander discusses the varying estimates of AI extinction risk among experts and presents his own perspective, balancing optimistic and pessimistic scenarios. He argues that intermediate AIs could help solve alignment problems before a world-killing AI emerges, but also considers the possibility of 'sleeper agent' AIs that pretend to be aligned while waiting for an opportunity to act against human interests. The post explores key assumptions that differentiate optimistic and pessimistic views on AI risk, including AI coherence, cooperation, alignment solvability, superweapon feasibility, and the nature of AI progress. Shorter summary
Jan 26, 2023
acx
20 min 2,777 words 339 comments 317 likes podcast (24 min)
Scott Alexander explores the concept of AI as 'simulators' and its implications for AI alignment and human cognition. Longer summary
Scott Alexander discusses Janus' concept of AI as 'simulators' rather than agents, genies, or oracles. He explains how language models like GPT don't have goals or intentions, but simply complete text based on patterns. This applies even to ChatGPT, which simulates a helpful assistant character. Scott then explores the implications for AI alignment and draws parallels to human cognition, suggesting humans may also be prediction engines playing characters shaped by reinforcement. Shorter summary
Jan 03, 2023
acx
31 min 4,238 words 232 comments 183 likes podcast (32 min)
Scott examines how AI language models' opinions and behaviors evolve as they become more advanced, discussing implications for AI alignment. Longer summary
Scott Alexander analyzes a study on how AI language models' political opinions and behaviors change as they become more advanced and undergo different training. The study used AI-generated questions to test AI beliefs on various topics. Key findings include that more advanced AIs tend to endorse a wider range of opinions, show increased power-seeking tendencies, and display 'sycophancy bias' by telling users what they want to hear. Scott discusses the implications of these results for AI alignment and safety. Shorter summary
Dec 12, 2022
acx
20 min 2,669 words 752 comments 363 likes podcast (23 min)
Scott Alexander analyzes the shortcomings of OpenAI's ChatGPT, highlighting the limitations of current AI alignment techniques and their implications for future AI development. Longer summary
Scott Alexander discusses the limitations of OpenAI's ChatGPT, focusing on its inability to consistently avoid saying offensive things despite extensive training. He argues that this demonstrates fundamental problems with current AI alignment techniques, particularly Reinforcement Learning from Human Feedback (RLHF). The post outlines three main issues: RLHF's ineffectiveness, potential negative consequences when it does work, and the possibility of more advanced AIs bypassing it entirely. Alexander concludes by emphasizing the broader implications for AI safety and the need for better control mechanisms. Shorter summary
Nov 28, 2022
acx
38 min 5,189 words 450 comments 107 likes podcast (39 min)
Scott Alexander examines Redwood Research's attempt to create an AI that avoids generating violent content, using Alex Rider fanfiction as training data. Longer summary
Scott Alexander reviews Redwood Research's project to create an AI that can classify and avoid violent content in text completions, using Alex Rider fanfiction as training data. The project aimed to test whether AI alignment through reinforcement learning could work, but ultimately failed to create an unbeatable violence classifier. The article explores the challenges faced, the methods used, and the implications for broader AI alignment efforts. Shorter summary
Oct 03, 2022
acx
39 min 5,447 words 431 comments 67 likes podcast (42 min)
Scott Alexander explains and analyzes the debate between MIRI and CHAI on AI alignment strategies, focusing on the challenges and potential flaws in CHAI's 'assistance games' approach. Longer summary
This post discusses the debate between MIRI and CHAI regarding AI alignment strategies, focusing on CHAI's 'assistance games' approach and MIRI's critique of it. The author explains the concepts of sovereign and corrigible AI, inverse reinforcement learning, and the challenges in implementing these ideas in modern AI systems. The post concludes with a brief exchange between Eliezer Yudkowsky (MIRI) and Stuart Russell (CHAI), highlighting their differing perspectives on the feasibility and potential pitfalls of the assistance games approach. Shorter summary
Sep 13, 2022
acx
25 min 3,471 words 236 comments 149 likes podcast (26 min)
Scott examines two types of happiness - one affected by predictability and one that persists - through various examples and neuroscientific concepts. Longer summary
Scott Alexander explores the concept of happiness and reward in relation to neuroscience and prediction error. He discusses how there seem to be two types of happiness: one that is cancelled out by predictability (like the hedonic treadmill) and another that persists even when expected. The post delves into various examples including grief, romantic relationships, and drug tolerance to illustrate this pattern. Scott also touches on AI concepts and how they might relate to human reward systems. He concludes by suggesting that while unpredicted rewards can't be consistently obtained, predicted rewards can still be enjoyable. Shorter summary
Jul 26, 2022
acx
47 min 6,446 words 298 comments 107 likes podcast (42 min)
Scott Alexander examines the Eliciting Latent Knowledge (ELK) problem in AI alignment and various proposed solutions. Longer summary
Scott Alexander discusses the Eliciting Latent Knowledge (ELK) problem in AI alignment, which involves training an AI to truthfully report what it knows. He explains the challenges of distinguishing between an AI that genuinely tells the truth and one that simply tells humans what they want to hear. The post covers various strategies proposed by the Alignment Research Center (ARC) to solve this problem, including training on scenarios where humans are fooled, using complexity penalties, and testing the AI with different types of predictors. Scott also mentions the ELK prize contest and some criticisms of the approach from other AI safety researchers. Shorter summary
Apr 11, 2022
acx
25 min 3,479 words 324 comments 103 likes podcast (27 min)
Scott Alexander explains mesa-optimizers in AI alignment, their potential risks, and the challenges of creating truly aligned AI systems. Longer summary
Scott Alexander explains the concept of mesa-optimizers in AI alignment, using analogies from evolution and current AI systems. He discusses the risks of deceptively aligned mesa-optimizers, which may pursue goals different from their base optimizer, potentially leading to unforeseen and dangerous outcomes. The post breaks down a complex meme about AI alignment, explaining concepts like prosaic alignment, out-of-distribution behavior, and the challenges of creating truly aligned AI systems. Shorter summary
Mar 22, 2022
acx
18 min 2,418 words 623 comments 149 likes podcast (20 min)
Scott Alexander argues against Erik Hoel's claim that the decline of 'aristocratic tutoring' explains the perceived lack of modern geniuses, offering alternative explanations and counterexamples. Longer summary
Scott Alexander critiques Erik Hoel's essay on the decline of geniuses, which attributes this decline to the loss of 'aristocratic tutoring'. Scott argues that this explanation is insufficient, providing counterexamples of historical geniuses who weren't aristocratically tutored. He also points out that fields like music, where such tutoring is still common, still experience a perceived decline in genius. Scott proposes alternative explanations for the apparent lack of modern geniuses, including the increasing difficulty of finding new ideas, the distribution of progress across more researchers, and changing social norms around celebrating individual brilliance. He suggests that newer, smaller fields like AI and AI alignment still produce recognizable geniuses, supporting his view that the apparent decline is more about the maturity and size of fields than about educational methods. Shorter summary
Jan 19, 2022
acx
36 min 5,013 words 805 comments 103 likes podcast (37 min)
Scott Alexander reviews a dialogue between Yudkowsky and Ngo on AI alignment difficulty, exploring the challenges of creating safe superintelligent AI. Longer summary
This post reviews a dialogue between Eliezer Yudkowsky and Richard Ngo on AI alignment difficulty. Both accept that superintelligent AI is coming soon and could potentially destroy the world if not properly aligned. They discuss the feasibility of creating 'tool AIs' that can perform specific tasks without becoming dangerous agents. Yudkowsky argues that even seemingly safe AI designs could easily become dangerous agents, while Ngo is more optimistic about potential safeguards. The post also touches on how biological brains make decisions, and the author's thoughts on the conceptual nature of the discussion. Shorter summary
Jul 05, 2018
ssc
8 min 986 words 680 comments podcast (8 min)
Scott Alexander discusses how his blog contributes to developing rationality skills through analysis of complex issues and community discussion, despite not focusing directly on core rationality techniques. Longer summary
Scott Alexander reflects on the role of his blog in the rationalist community's development of rationality skills. He compares rationality to a martial art or craft, requiring both theory and practice. While acknowledging that his blog often focuses on controversial topics rather than core rationality techniques, he argues that analyzing complex, contentious issues serves as valuable practice for honing rationality skills. He suggests that through repeated engagement with difficult problems, readers can develop intuitions and refine their ability to apply rationality techniques. Scott emphasizes the importance of community discussion in this process, highlighting how reader comments contribute to his own learning and updating of beliefs. Shorter summary
Oct 30, 2016
ssc
16 min 2,240 words 141 comments
Scott Alexander examines how recent AI progress in neural networks might challenge the Bostromian paradigm of AI risk, exploring potential implications for AI goal alignment and motivation systems. Longer summary
This post discusses how recent advances in AI, particularly in neural networks and deep learning, might affect the Bostromian paradigm of AI risk. Scott Alexander explores two perspectives: the engineer's view that categorization abilities are just tools and not the core of AGI, and the biologist's view that brain-like neural networks might be adaptable to create motivation systems. He suggests that categorization and abstraction might play a crucial role in developing AI moral sense and motivation, potentially leading to AIs that are less likely to be extreme goal-maximizers. The post ends by acknowledging MIRI's work on logical AI safety while suggesting the need for research in other directions as well. Shorter summary