A Very Unlikely Chess Game · Read Scott Alexander

Almost 25 years after Kasparov vs. Deep Blue, another seminal man vs. machine matchup:

Neither competitor has much to be proud of here. White has a poor opening. Black screws up and loses his queen for no reason. A few moves later, white screws up and loses his rook for no reason. Better players will no doubt spot other humiliating mistakes. But white does eventually eke out a victory. And black does hold his own through most of the game.

White is me. My excuse is that I only play chess once every couple of years, plus I’m entering moves on an ASCII board I can barely read.

Black is GPT-2. Its excuse is that it’s a text prediction program with no concept of chess. As far as it knows, it’s trying to predict short alphanumeric strings like “e2e4” or “Nb7”. Nobody told it this represents a board game. It doesn’t even have a concept of 2D space that it could use to understand such a claim. But it still captured my rook! Embarrassing!

Backing up: last year, I wrote GPT-2 As Step Toward General Intelligence, where I argued that the program wasn’t just an essay generator, it was also kind a general pattern-recognition program with text-based input and output channels. Figure out how to reduce a problem to text, and you can make it do all kinds of unexpected things.

Friend-of-the-blog Gwern Branwen has been testing the limits of this idea. First he taught GPT-2 to write poetry. Some of it was pretty good:

Fair is the Lake, and bright the wood,
With many a flower-full glamour hung:
Fair are the banks; and soft the flood
With golden laughter of our tongue.

For his next trick, he found a corpus of music in “ABC notation”, a way of representing musical scores as text. He fed it to GPT-2 and got it to write folk songs for him. I’m a fan:

Last month, I asked him if he thought GPT-2 could play chess. I wondered if he could train it on a corpus of chess games written in standard notation (where, for example, e2e4 means “move the pawn at square e2 to square e4”). There are literally millions of games written up like this. GPT-2 would learn to predict the next string of text, which would correspond to the next move in the chess game. Then you would prompt it with a chessboard up to a certain point, and it would predict how the chess masters who had produced its training data would continue the game – ie make its next move using the same heuristics they would.

Gwern handed the idea to his collaborator Shawn Presser, who had a working GPT-2 chess engine running within a week:

GPT-2 chess is promising. After an hour of training, 1.5B is pretty good at opening theory.

Longer sequences tend to fail due to invalid moves, but this shows it's possible in principle to make a GPT-2 chess engine.

And maybe after more training it'll make fewer invalid moves. pic.twitter.com/DqC4WiPfHV

— Shawn Presser (@theshawwn) January 1, 2020

I'll post some games, up to the point it generates an invalid move (which seems to happen around move 11).

Paste this into https://t.co/wWGpVu9ko6

1.e4 c5 2.Nf3 e6 3.d4 cxd4 4.Nxd4 a6 5.Bd3 Nf6 6.Nc3 d6 7.O-O Be7 8.f4 O-O 9.Kh1 Nbd7 pic.twitter.com/8Bl2ijZiCZ

— Shawn Presser (@theshawwn) January 1, 2020

After a day of training (2.4M examples), GPT-2 1.5B can reach move 14 with no invalid moves.

1.e4 e5 2.Nf3 d6 3.d4 exd4 4.Qxd4 a6 5.Be2 Nf6 6.O-O Be7 7.Re1 O-O 8.c3 b5 9.a4 Bb7 10.axb5 axb5 11.Nbd2 Re8 12.h3 g6 13.Ra5 Qd7 14.Ng5 c5 pic.twitter.com/2XuH6iLaD5

— Shawn Presser (@theshawwn) January 2, 2020

It can reach midgame by removing invalid moves.

1.e4 c5 2.Nf3 d6 3.Bb5+ Nd7 4.O-O a6 5.Be2 b6 6.a4 e6 7.d4 Be7 8.c3 9.Nbd2 Ne5 10.Nxe5 dxe5 11.12.13.f4 Bb7 14.Bd3 g6 15.Nf3 16.17.Re1 18.Qe2 Bf8 19.Bd2 20.Qf2 Qxd4 21.22.23.Rf1 Qxc3 24.Qxc5 Bd6 25.Qc7 26.Qd8+ 27.Qd6 Rd8 pic.twitter.com/3uyPaP9LHt

— Shawn Presser (@theshawwn) January 2, 2020

GPT2 Chess update: I wrote some code to calculate the probability of all valid chess moves. It can reach endgame now. https://t.co/QQzhZJmgQ9

It starts to blunder every game at around move 13. We suspect it’s losing track of board state. (It’s trained solely on PGN notation.)

— Shawn Presser (@theshawwn) January 4, 2020

I am preparing to release a notebook where you can play chess vs GPT-2. If anyone wants to help beta test it:

1. visit https://t.co/CpWrFvtnY2
2. open in playground mode
3. click Runtime -> Run All
4. Scroll to the bottommost cell and wait 6 minutes

If you get stuck, tell me.

— Shawn Presser (@theshawwn) January 6, 2020

You can play against GPT-2 yourself by following the directions in the last tweet, though it won’t be much of a challenge for anyone better than I am.

This training explains the program’s strengths (good at openings) and weaknesses (bad when play deviates from its expectations). For example, ggreer analyzes why GPT-2 lost its queen in the game above. By coincidence, my amateurish flailing resembled a standard opening called the Indian Game. GPT-2 noticed the pattern and played a standard response to it. But the resemblance wasn’t perfect, so one of GPT-2’s moves which would have worked well in a real Indian Game brought its queen where I could easily capture it. I don’t want to conjecture on how far “mere pattern-matching” can take you – but you will at least need to be a better pattern-matcher than this to get very far.

But this is just what a friend of a friend managed to accomplish in a few days of work. Gwern stresses that there are easy ways to make it much better:

Obviously, training on just moves with the implicit game state having to be built up from scratch from the history every time is very difficult – even MuZero at least gets to see the entire game state at every move when it’s trying to predict legal & good next moves, and depends heavily on having a recurrent state summarizing the game state. Maybe rewriting games to provide (state,action) pairs will make GPT-2 work much better.

What does this imply? I’m not sure (and maybe it will imply more if someone manages to make it actually good). It was already weird to see something with no auditory qualia learn passable poetic meter. It’s even weirder to see something with no concept of space learn to play chess. Is any of this meaningful? How impressed should we be that the same AI can write poems, compose music, and play chess, without having been designed for any of those tasks? I still don’t know.