Tag: mesa-optimizers

How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and do semantic search through the whole codex. Enjoy!

Minutes:

Blog:

Year:

4821 tags

Show all filters

2 posts found

Per page:

Compact Mode

Save Reads

May 23, 2022

acx

Read on

Willpower, Human and Machine

7 min • 939 words • 194 comments • 74 likes • podcast (8 min)

Scott Alexander explores parallels between human willpower and potential AI development, suggesting future AIs might experience weakness of will similar to humans. Longer summary

Scott Alexander explores the concept of willpower in humans and AI, drawing parallels between evolutionary drives and AI training. He suggests that both humans and future AIs might experience a struggle between instinctual drives and higher-level planning modules. The post discusses how evolution has instilled basic drives in animals, which then developed their own ways to satisfy these drives. Similarly, AI training might first develop 'instinctual' responses before evolving more complex planning abilities. Scott posits that this could lead to AIs experiencing weakness of will, contradicting the common narrative of hyper-focused AIs in discussions of AI risk. He also touches on the nature of consciousness and agency, questioning whether the 'I' of willpower is the same as the 'I' of conscious access. Shorter summary

Recurring tags: AI risk (41), consciousness (31), AI development (19), evolution (16), willpower (7), mesa-optimizers (2)

Apr 11, 2022

acx

Read on

Deceptively Aligned Mesa-Optimizers: It's Not Funny If I Have To Explain It

23 min • 3,479 words • 324 comments • 103 likes • podcast (27 min)

Scott Alexander explains mesa-optimizers in AI alignment, their potential risks, and the challenges of creating truly aligned AI systems. Longer summary

Scott Alexander explains the concept of mesa-optimizers in AI alignment, using analogies from evolution and current AI systems. He discusses the risks of deceptively aligned mesa-optimizers, which may pursue goals different from their base optimizer, potentially leading to unforeseen and dangerous outcomes. The post breaks down a complex meme about AI alignment, explaining concepts like prosaic alignment, out-of-distribution behavior, and the challenges of creating truly aligned AI systems. Shorter summary

Recurring tags: AI safety (55), AI alignment (22), evolution (16), deception (4), mesa-optimizers (2), prosaic alignment (2)

Per page: