Tag: interpretability

How to explore Scott Alexander's work and his 1500+ blog posts? This unaffiliated fan website lets you sort and do semantic search through the whole codex. Enjoy!

Minutes:

Blog:

Year:

4861 tags

Show all filters

1 posts found

Per page:

Compact Mode

Save Reads

Aug 21, 2023

acx

Read on

What Can Fetish Research Tell Us About AI?

18 min • 2,763 words • 403 comments • 191 likes • podcast (18 min)

Scott Alexander suggests that studying human fetishes could provide insights into AI alignment challenges, particularly regarding generalization and interpretability. Longer summary

Scott Alexander explores the idea that fetish research might help understand AI alignment. He draws parallels between evolution's 'alignment' of humans towards reproduction and our attempts to align AI with human values. The post discusses how fetishes represent failures in evolution's alignment strategy, similar to potential AI alignment failures. Scott suggests that studying how humans develop fetishes could provide insights into how AIs might misgeneralize or misalign from intended goals. He proposes several speculative explanations for common fetishes and discusses how these might relate to AI alignment challenges, particularly in terms of generalization and interpretability problems. Shorter summary

Recurring tags: autism (28), evolutionary psychology (27), AI alignment (22), fetishes (4)

Per page: