How an intern helped build the AI that shook the world

AlphaGo’s victory broadcast on TV

Im Hun-jung/Yonhap/AP Photo via Getty Images

In March 2016, Google DeepMind’s artificial intelligence system AlphaGo shocked the world. In an impressive five-game series of Go, the ancient Chinese board game, the AI beat the world’s best player, Lee Sedol – a moment that was televised in front of millions and hailed by many as a historic moment in the development of artificial intelligence.

Chris Maddison, now a professor of artificial intelligence at the University of Toronto, was then a master’s student and helped get the project off the ground. It all began when Ilya Sutskever, who later founded OpenAI, contacted…

Alex Wilkins: How did the idea for AlphaGo first come about?

Chris Maddison: Ilya (Sutskever) gave me the following argument for why we should work with Go. He said, Chris, do you think when an expert player looks at the Go board, they can pick the best move in half a second? If you think they can, that means you can learn a pretty good policy for choosing the best move using a neural net.

The reason is that half a second is about the time it takes for your visual cortex to do a forward pass (one round of processing), and we already knew from ImageNET (a major AI image recognition competition) that we’re pretty good at approximating things that only take one forward pass of your visual cortex.

I bought that argument, so I decided to join (Google Brain) as an intern in the summer of 2014.

How did AlphaGo evolve from there?

When I joined, there was another small team at DeepMind that I was going to work with, which was Aja Huang and David Silver, who had started working on Go. It was basically my responsibility to start building neural networks. It was a dream.

There were a bunch of different approaches we tried, and many of the first things we tried failed. In the end I just got frustrated and tried the dumbest, easiest thing, which was to try to predict the next move that an expert would make in a given board position, and train a neural network on a large corpus of expert plays. And that turned out to be the approach that really got us going.

At the end of the summer, we arranged a small match with DeepMind’s Thore Graepel, who considered himself a decent Go player, and my networks beat him. DeepMind then started to get convinced that this was going to be a real thing and started putting resources into it and building a big team around it.

How hard was it to beat Lee Sedol?

I remember in the summer of 2014 we practically had Lee Sedol’s portrait on the desk next to us. I’m not a Go player, but Aja (Huang) is. Every time I wanted to build a new network, it would get a little better, and I would turn to Aja and I would say, OK, we’re a little better, how close are we to Lee Sedol? And Aja turned to me and said, Chris, you don’t understand. Lee Sedol is a stone from God.

You left the AlphaGo team before the big event. Why?

David (Silver) said we want to keep you on and really push this project to the next level, and in retrospect this was maybe one of the dumber decisions I made, I turned him down. I said I think I need to focus on my PhD, I’m an academic at heart. I went back to my PhD and consulted loosely with the project from that point on. I’m a little proud to say that it took a while for them to beat my neural networks. But then, ultimately, the object that played Lee Sedol was the product of a great engineering effort and a great team.

What was the mood like in Seoul when AlphaGo won?

Being there in Seoul at that moment was hard to express. It was emotional. It was intense. There was a feeling of anxiety. You go in confident, but you never know. It’s like a sports game. Statistically, you’re the better player, but you never know how it’s going to shake out. I remember being in the hotel where we played the games and looking out the window. We were at a high enough level that you could see out onto one of the major city intersections. I realized there was a big screen, much like Times Square, showing our match. And then I looked along the sidewalks, and people were just standing in line and watching the screen. I’d heard numbers like hundreds of millions of people in China saw the first game, but I remember that moment as, my God, we really stopped East Asia in its tracks.

How important has AlphaGo been to AI more generally?

Much has changed on a surface level about the world of large language models (LLM), they are now quite different in some ways from AlphaGo, but in fact there is an underlying technological thread that hasn’t really changed.

So the first part of the algorithm is to train a neural network to predict the next move. Today’s LLMs begin with what we call pre-training to predict the next word, from a large corpus of human text found mostly on the internet.

For the second step in AlphaGo, we took the information from the human corpus that was compressed into these neural networks and we refined it using reinforcement learning, to adjust the behavior of the system towards the goal of winning games.

When you learn to predict an expert’s next move, they are trying to win, but that is not the only thing that explains the next move. Maybe they don’t understand what the best move is, maybe they made a mistake, so you have to adapt the overall system to your true goal, which in the case of AlphaGo was to win.

In large language models, it is the same after pretraining. The networks are not aligned with how we want to use them, and so we do a series of reinforcement learning steps that align the networks with our goals.

In some ways, not much has changed.

Does it tell us anything about where we can expect AIs to succeed?

It has consequences in relation to what we choose to focus on. If you’re worried about making progress on important problems, the main bottlenecks to worry about are whether you have enough data to do pre-training, and do you have reward cues to do post-training. If you don’t have those ingredients, no amount of cleverness—you know, this algorithm versus that algorithm—is going to get you going.

Did you feel any sympathy for Lee Sedol?

Lee Sedol had been this idol in the summer of 2014, this unattainable milestone. To then suddenly be there in person, seeing the fights, his stress, his anxiety, the realization that this was a much more worthy opponent than he might have thought going in, it was very stressful. You don’t want to put someone in that position. When he lost the match, he apologized to humanity, saying, “This is my fault, not yours.” It was tragic.

There is also a custom in Go to review the match with the opponent. Someone wins or loses, but you judge the match at the end, disconnect from the game and explore variations with each other. Lee Sedol couldn’t do that because AlphaGo wasn’t human, so instead he let his friends come in and review the match, but it’s just not the same. There was something heartbreaking about it.

But I didn’t appreciate all the man-versus-machine narratives surrounding the battle, because a team of humans built AlphaGo. It was the effort of a tribe to build an object that could achieve excellence in a human game. It was ultimately the object that all our blood, sweat and tears went into.

Do you think there is still a place for humans in the world as AI performs more human thinking?

We learn more about the game of Go, and if we think that game is beautiful, which we do, and AIs can teach us more about that beauty, there’s a lot of good in that, too. There is a difference between goal and purpose. The aim of the game of Go is to win, but that is not the only aim – one aim is to have fun. Board games are not ruined by the presence of AI; Chess is a thriving industry. We still appreciate the intrigue and human achievement of this sport.

Topics:

Click Here to Get More