Want to dive into Scott Alexander's work and his thousands of blog posts? This fan website lets you sort and do semantic search through the whole codex. Enjoy!

See also Top Posts and All Tags.

Tag: AI interpretability

Minutes:
Pick a custom range (minutes). Leave a field empty for no limit.
Blog:
Year:
All
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
1 posts found
Compact Mode
Save Reads
Nov 27, 2023
acx
Read on
23 min 3,513 words 234 comments 288 likes podcast (24 min)
Scott Alexander discusses recent breakthroughs in AI interpretability, explaining how researchers are beginning to understand the internal workings of neural networks. Longer summary
Scott Alexander explores recent advancements in AI interpretability, focusing on Anthropic's 'Towards Monosemanticity' paper. He explains how AI neural networks function, introduces the concept of superposition where fewer neurons represent multiple concepts, and describes how researchers have managed to interpret AI's internal workings by projecting real neurons into simulated neurons. The post discusses the implications of this research for understanding both artificial and biological neural systems, as well as its potential impact on AI safety and alignment. Shorter summary
Per page:
Showing 1 to 1 of 1 results
Get these search results in an EPUB

Your filters match 1 posts.

Posts to include
Leave empty to keep the defaults. Range cannot exceed 500 posts.
Download now

Generates an EPUB right now and downloads it to your device.

Send to email

Generates an EPUB in the background and emails you a temporary download link.

Your email is not shared with anyone.

Email address