Want to dive into Scott Alexander's work and his thousands of blog posts? This fan website lets you sort and do semantic search through the whole codex. Enjoy!

See also Top Posts and All Tags.

Tag: interpretability

Minutes:
Pick a custom range (minutes). Leave a field empty for no limit.
Blog:
Year:
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
1 posts found
Compact Mode
Save Reads
Aug 21, 2023
acx
Read on
18 min 2,763 words 395 comments 195 likes podcast (18 min)
Scott Alexander suggests that studying human fetishes could provide insights into AI alignment challenges, particularly regarding generalization and interpretability. Longer summary
Scott Alexander explores the idea that fetish research might help understand AI alignment. He draws parallels between evolution's 'alignment' of humans towards reproduction and our attempts to align AI with human values. The post discusses how fetishes represent failures in evolution's alignment strategy, similar to potential AI alignment failures. Scott suggests that studying how humans develop fetishes could provide insights into how AIs might misgeneralize or misalign from intended goals. He proposes several speculative explanations for common fetishes and discusses how these might relate to AI alignment challenges, particularly in terms of generalization and interpretability problems. Shorter summary
Per page:
Showing 1 to 1 of 1 results
Get these search results in an EPUB

Your filters match 1 posts.

Posts to include
Leave empty to keep the defaults. Range cannot exceed 500 posts.
Download now

Generates an EPUB right now and downloads it to your device.

Send to email

Generates an EPUB in the background and emails you a temporary download link.

Your email is not shared with anyone.

Email address