Friday, March 25, 2016

A way to deal with enormous branching factors

When you play Chess, there are on average something like 30 different moves you can make each turn. In other words, the branching factor is around 30. Go, on the other hand, has a branching of 300 or so. The standard Minimax algorithm, traditionally used to play Chess and similar board games, can't handle this branching factor very well. This is one of the two reasons why Go is so much harder for computers to play than Chess is, and why it took 19 years between the victory of AI over humans in Chess and the same victory in Go. (The other reason is the difficulty of state evaluation in Go.)

Let's think a bit about that number: 300. Apparently that's enough to stop Minimax (and Xerxes). Ultimately, Go was conquered with the help of Monte Carlo Tree Search (MCTS), which handles higher branching factors better than Minimax. But still, 300 is not a terribly big number.

Now consider this. Chess and Go are both games where you only move a single piece each turn. Many, perhaps most, turn based games are not like this. In board games such as Risk or Diplomacy you can move a number of units each turn. The same goes for computer strategy games such as Civilization, Advance Wars or XCOM. In trading card games such as Magic and Hearthstone you can play out a large number of cards each turn. Let's call these games multi-action games. (And let's not even get started on a game such as StarCraft, where you move lots of units in tandem but have the added complication of continuous time and space.) One might point out that such games model a number of real-world scenarios with varying degrees of accuracy, so an AI method for playing such games could be useful for non-game applications.

Hero Academy, a multi-action turn-based game with a characteristically huge branching factor.
What's the branching factor of a multi-action game? If you have 6 units to move, and you can move each unit to one of ten possible positions, you have a branching factor of 10 to the power of 6, or one million! Suddenly 300 does not feel like a very big number at all. And this is even simplifying things a bit, assuming that the order in which we move the units doesn't matter.

Obviously, Minimax is incapable of handling a branching factor of a million. A two-ply search means that a trillion positions will need to be examined. Even MCTS runs into very serious problems with this kind of branching factor. The basic problem is that you need to explore all the actions from the root node before doing anything else, and that might already take more time than you have that turn. (There are some clever ways of using MCTS in this space by considering individual unit actions, but that will have to be a topic for another time.)

At EvoStar 2016, we will present a paper that takes a fresh look at this problem and comes with a somewhat surprising solution. The paper is written by Niels Justesen, Tobias Mahlmann and myself; Niels was one of our star masters students at ITU Copenhagen, and Tobias and me supervised his masters thesis. We used a re-implementation of the strategy game Hero Academy as our testbed; Hero Academy is fairly representative of computer strategy games: each turn, a number of action points are available, and any action point can be used to move any character.

The basic idea of the paper is that the branching factor is so huge that we cannot do any tree search. Instead, we will have to concentrate on searching the space of possible action sequences that make up a single turn of the game. A simple state evaluation function will have to suffice when judging the quality of a turn–even though we cannot simulate the actions of the opponent, a good execution of each single turn might be all we need to play well. But even then, the branching factor is too high to search the possibilities of a single turn exhaustively. So we need to search smartly.

Therefore, we turned to evolution. We use an evolutionary algorithm to evolve sequences of actions that could make up a single turn. The fitness function is the quality of the board state of the end of the turn, as measured by some straightforward metrics. Both mutation and crossover are used to provide variation, as is common in evolutionary computation. We call the use of evolution to search actions within a turn Online Evolution.

The crossover scheme for Online Evolution in Hero Academy.
What about the results? Simply put, Online Evolution kicks ass. Evolution outperforms both a standard MCTS and two different greedy search methods by a wide margins. While it would certainly be possible to find MCTS versions that would handle this particular problem better, Online Evolution is at the very least highly competitive on this problem.

This is significant because the use of evolution for planning actions is very underexplored. While evolutionary algorithms have been used extensively to play games, it is almost always through evolving a program of some kind to play the game (for example through neuroevolution). The use of evolution in the actual playing phase seems to be almost virgin territory. We took some inspiration from the only other example of this general idea we know, Rolling Horizon Evolution for the PTSP game. But ultimately, there is no good reason that evolution could not be used to solve all the same problems that MCTS can. In fact, in his recent PhD thesis Cameron McGuinness showed that MCTS can be used to solve many of the same problems that evolution can. There seems to be some deeper connection here, which is well worth exploring.

Are you still with me? Good! Then you can go read our paper to get all the details!




4 comments:

Marius said...

So you've now moved the computational problem to the fitness function? ;-)

Jean said...

If I understand your algorithm, it sounds like it would make a good AI, but not a great AI. Reason being that if you just score the various boards that can be created in this turn it will have a worst case behavior of being suicidal. In other words, it will take actions that will lead to an immediately high board score, but which in 2 or more turns could be absolutely dominated by a superior opponent force.

Julian Togelius said...

Haha, in a way... Except we don't actually have to solve the problem! The fitness function we use is really simple in fact, essentially a weighted piece counter. (Details in the paper.)

Julian Togelius said...

Yes, the algorithm *might* make a move that is suicidal given the lack of long-term planning. But the empirical results are quite good, so it seems that it doesn't make such moves in practice - at least not many of them.