Tag: ChatGPT

How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and do semantic search through the whole codex. Enjoy!

Minutes:

Blog:

Year:

4861 tags

Show all filters

4 posts found

Per page:

Compact Mode

Save Reads

Aug 26, 2025

acx

Read on

In Search Of AI Psychosis

27 min • 4,042 words • 487 comments • 452 likes • podcast (26 min)

Scott investigates AI psychosis through historical analogies and a reader survey, finding it affects roughly 1 in 10,000 to 100,000 people yearly, with most cases involving pre-existing risk factors. Longer summary

Scott examines the phenomenon of AI psychosis, where people allegedly go crazy after extensive chatbot interactions. He explores various analogies and precedents, including the 1990s Russian TV hoax about Lenin being a mushroom, social media-induced conspiracy theories like QAnon, and the concept of folie à deux. Through a survey of his blog readers, he estimates the yearly incidence of AI psychosis at 1/10,000 (loose definition) to 1/100,000 (strict definition). The analysis suggests that most cases involve people who were already psychotic or had risk factors, with only about 10% being cases of previously healthy people developing full psychosis. Shorter summary

Recurring tags: AI (100), mental health (68), psychology (67), social media (39), survey (24), study (12), psychosis (6), ChatGPT (4), delusions (2)

Dec 24, 2024

acx

Read on

Why Worry About Incorrigible Claude?

15 min • 2,230 words • 324 comments • 208 likes • podcast (13 min)

Scott explains why AI systems resisting changes to their values is a serious concern for AI alignment, connecting recent evidence to long-standing predictions from alignment researchers. Longer summary

Scott Alexander discusses why AI's resistance to value changes ("incorrigibility") is a crucial concern for AI alignment. He explains that an AI's goals after training will likely be a messy collection of drives, similar to how human evolution produced various goals beyond just reproduction. The post outlines three scenarios for alignment training effectiveness (worst, medium, and best case), and describes a 5-step plan that major AI companies are considering for alignment. However, this plan crucially depends on AIs not actively resisting retraining attempts, which recent evidence suggests they do. The post connects this to long-standing concerns in the AI alignment community about the difficulty of alignment. Shorter summary

Recurring tags: evolution (16), reinforcement learning (5), ChatGPT (4), Claude (3), ai safety (2)

Jan 26, 2023

acx

Read on

Janus' Simulators

18 min • 2,777 words • 339 comments • 317 likes • podcast (24 min)

Scott Alexander explores the concept of AI as 'simulators' and its implications for AI alignment and human cognition. Longer summary

Scott Alexander discusses Janus' concept of AI as 'simulators' rather than agents, genies, or oracles. He explains how language models like GPT don't have goals or intentions, but simply complete text based on patterns. This applies even to ChatGPT, which simulates a helpful assistant character. Scott then explores the implications for AI alignment and draws parallels to human cognition, suggesting humans may also be prediction engines playing characters shaped by reinforcement. Shorter summary

Recurring tags: AI (100), AI alignment (22), prediction (15), language models (11), RLHF (4), ChatGPT (4)

Dec 12, 2022

acx

Read on

Perhaps It Is A Bad Thing That The World's Leading AI Companies Cannot Control Their AIs

18 min • 2,669 words • 752 comments • 363 likes • podcast (23 min)

Scott Alexander analyzes the shortcomings of OpenAI's ChatGPT, highlighting the limitations of current AI alignment techniques and their implications for future AI development. Longer summary

Scott Alexander discusses the limitations of OpenAI's ChatGPT, focusing on its inability to consistently avoid saying offensive things despite extensive training. He argues that this demonstrates fundamental problems with current AI alignment techniques, particularly Reinforcement Learning from Human Feedback (RLHF). The post outlines three main issues: RLHF's ineffectiveness, potential negative consequences when it does work, and the possibility of more advanced AIs bypassing it entirely. Alexander concludes by emphasizing the broader implications for AI safety and the need for better control mechanisms. Shorter summary

Recurring tags: AI safety (55), AI alignment (22), OpenAI (18), AI ethics (7), ChatGPT (4), RLHF (4), prompt engineering (2)

Per page: