The Omnigenic Model As Metaphor For Life
The collective intellect is change-blind. Knowledge gained seems so natural that we forget what it was like not to have it. Piaget says children gain long-term memory at age 4 and don’t learn abstract thought until ten; do you remember what it was like not to have abstract thought? We underestimate our intellectual progress because every every sliver of knowledge acquired gets backpropagated unboundedly into the past.
For decades, people talked about “the gene for height”, “the gene for intelligence”, etc. Was the gene for intelligence on chromosome 6? Was it on the X chromosome? What happens if your baby doesn’t have the gene for intelligence? Can they still succeed?
Meanwhile, the responsible experts were saying traits might be determined by a two-digit number of genes. Human Genome Project leader Francis Collins estimated that there were “about twelve genes” for diabetes, and “all of them will be discovered in the next two years”. Quanta Magazine reminds us of a 1999 study which claimed that “perhaps more than fifteen genes” might contribute to autism. By the early 2000s, the American Psychological Association was a little more cautious, was saying intelligence might be linked to “dozens – if not hundreds” of genes.
The most recent estimate for how many genes are involved in complex traits like height or intelligence is approximately “all of them” – by the latest count, about twenty thousand. From this side of the veil, it all seems so obvious. It’s hard to remember back a mere twenty or thirty years ago, when people earnestly awaited “the gene for depression”. It’s hard to remember the studies powered to find genes that increased height by an inch or two. It’s hard to remember all the crappy p-hacked results that okay, we found the gene for extraversion, here it is! It’s hard to remember all the editorials in The Guardian about how since nobody had found the gene for IQ yet, genes don’t matter, science is fake, and Galileo was a witch.
And even remembering those times, they seem incomprehensible. Like, really? Only a few visionaries considered the hypothesis that the most complex and subtle of human traits might depend on more than one protein? Only the boldest revolutionaries dared to ask whether maybe cystic fibrosis was not the best model for the entirety of human experience?
This side of the veil, instead of looking for the “gene for intelligence”, we try to find “polygenic scores”. Given a person’s entire genome, what function best predicts their intelligence? The most recent such effort uses over a thousand genes and is able to predict 10% of variability in educational attainment. This isn’t much, but it’s a heck of a lot better than anyone was able to do under the old “dozen genes” model, and it’s getting better every year in the way healthy paradigms are supposed to.
Genetics is interesting as an example of a science that overcame a diseased paradigm. For years, basically all candidate gene studies were fake. “How come we can’t find genes for anything?” was never as popular as “where’s my flying car?” as a symbol of how science never advances in the way we optimistically feel like it should. But it could have been.
And now it works. What lessons can we draw from this, for domains that still seem disappointing and intractable?
Turn-of-the-millennium behavioral genetics was intractable because it was more polycausal than anyone expected. Everything interesting was an excruciating interaction of a thousand different things. You had to know all those things to predict anything at all, so nobody predicted anything and all apparent predictions were fake.
Modern genetics is healthy and functional because it turns out that although genetics isn’t easy, it is simple. Yes, there are three billion base pairs in the human genome. But each of those base pairs is a nice, clean, discrete unit with one of four values. In a way, saying “everything has three billion possible causes” is a mercy; it’s placing an upper bound on how terrible genetics can be. The “secret” of genetics was that there was no “secret”. You just had to drop the optimistic assumption that there was any shortcut other than measuring all three billion different things, and get busy doing the measuring. The field was maximally perverse, but with enough advances in sequencing and computing, even the maximum possible level of perversity turned out to be within the limits of modern computing.
(this is an oversimplification: if it were really maximally perverse, chaos theory would be involved somehow. Maybe a better claim is that it hits the maximum perversity bound in one specific dimension)
One possible lesson here is that the sciences where progress is hard are the ones that have what seem like an unfair number of tiny interacting causes that determine everything. We should go from trying to discover “the” cause, to trying to find which factors we need to create the best polycausal model. And we should go from seeking a flash of genius that helps sweep away the complexity, to figuring out how to manage complexity that cannot be swept away.
Late-90s/early-00s psychiatry was a lot like late-90s/early-00s genetics. The public was talking about “the cause” of depression: serotonin. And the responsible experts were saying oh no, depression might be caused by as many as several different things.
Now the biopsychosocial model has caught on and everyone agrees that depression is complicated. I don’t know if we’re still at the “dozens of things” stage or the “hundreds of things stage”, but I don’t think anyone seriously thinks it’s fewer than a dozen. The structure of depression seems different from the structure of genetic traits in that one cause can still have a large effect; multiple sclerosis might explain less than 1% of the variance in depressedness, but there will be a small sample of depressives whose condition is almost entirely because of multiple sclerosis. But overall, I think the analogy to genetics is a good one.
If this is true, what can psychiatry (and maybe other low-rate-of-progress sciences) learn from genetics?
One possible lesson is: there are more causes than you think. Stop looking for “a cause” or “the ten causes” and start figuring out ways to deal with very numerous causes.
There are a bunch of studies that are basically like this one linking depression to zinc deficiency. They are good as far as they go, but it’s hard to really know what to do with them. It’s like finding one gene for intelligence. Okay, that explains 0.1% of the variability, now what?
We might imagine trying to combine all these findings into a polycausal score. Take millions of people, measure a hundred different variables – everything from their blood zinc levels, to the serotonin metabolites in their spinal fluid, to whether their mother loved them as a child – then do statistics on them and see how much of the variance in depression we can predict based on the inputs. “Do statistics on them” is a heck of a black box; genes are kind of pristine and causally unidirectional, but all of these psychological factors probably influence each other in a hundred different ways. In practice I think this would end up as a horribly expensive boondoggle that didn’t work at all. But in theory I think this is what a principled attempt to understand depression would look like.
(“understand depression” might be the wrong term here; it conflates being able to predict a construct with knowing what real-world phenomenon the construct refers to. We are much better at finding genes for intelligence than at understanding exactly what intelligence is, and whether it’s just a convenient statistical construct or a specific brain parameter. By analogy, we can imagine a Martian anthropologist who correctly groups “having a big house”, “driving a sports car”, and “wearing designer clothes” into a construct called “wealth”, and is able to accurately predict wealth from a model including variables like occupation, ethnicity, and educational attainment – but who doesn’t understand that wealth = having lots of money. I think it’s still unclear to what degree intelligence and depression have a simple real-world wealth-equals-lots-of-money style correspondence – though see here and here.)
A more useful lesson might be skepticism about personalized medicine. Personalized medicine – the idea that I can read your genome and your blood test results and whatever and tell you what antidepressant (or supplement, or form of therapy) is right for you has been a big idea over the past decade. And so far it’s mostly failed. A massively polycausal model would explain why. The average personalized medicine company gives you recommendations based on at most a few things – zinc levels, gut flora balance, etc. If there are dozens or hundreds of things, then you need the full massively polycausal model – which as mentioned before is computationally intractable at least without a lot more work.
(you can still have some personalized medicine. We don’t have to know the causes of depression to treat it. You might be depressed because your grandfather died, but Prozac can still make you feel better. So it’s possible that there’s a simple personalized monocausal way to check who eg responds better to Prozac vs. Lexapro, though the latest evidence isn’t really bullish about this. But this seems different from a true personalized medicine where we determine the root cause of your depression and fix it in a principled way.)
Even if we can’t get much out of this, I think it can be helpful just to ask which factors and sciences are oligocausal vs. massively polycausal. For example, what percent of variability in firm success are economists able to determine? Does most of the variability come from a few big things, like talented CEOs? Or does most of it come from a million tiny unmeasurable causes, like “how often does Lisa in Marketing get her reports in on time”?
Maybe this is really stupid – I’m neither a geneticist or a statistician – but I imagine an alien society where science is centered around polycausal scores. Instead of publishing a paper claiming that lead causes crime, they publish a paper giving the latest polycausal score for predicting crime, and demonstrating that they can make it much more accurate by including lead as a variable. I don’t think you can do this in real life – you would need bigger Big Data than anybody wants to deal with. But like falsifiability and compressability, I think it’s a useful thought experiment to keep in mind when imagining what science should be like.