Halite Home

ML starter bot tutorial


#1

For this tutorial and the ML starter bot, we will use python 3.5 and keras with a tensorflow backend, but it should be relatively straightforward to implement this in some other deep learning framework.

Download the ML starter bot from:

After you've installed numpy, tensorflow, and keras, download @hmate9's replay cache, unzip, and begin training:

wget https://dl.dropboxusercontent.com/s/uq8peg023a5vnkj/replays.zip?dl=0
unzip replays.zip?dl=0
python train_bot.py ./replays

train_bot.py creates the ML model, loads and processes each replay into the format expected by the model, trains the model, and saves it as model.h5. To simplify the problem of making moves, we will make each tile's move independently. As input, the model will take input from up to four squares away in each direction for a total of a 9x9 view centered on the tile. For this 9x9, we will have four channels: whether we own the tile (0 or 1), whether an enemy owns the tile (0 or 1), the production (divided by 20), and the strength (divided by 255). The scaling factors are intended to keep our input in a similar range, which tends to speed up training.

Defining the model

VISIBLE_DISTANCE = 4
input_dim=4*(2*VISIBLE_DISTANCE+1)*(2*VISIBLE_DISTANCE+1)

model = Sequential([Dense(512, activation='relu', input_dim=input_dim),
                    Dense(512, activation='relu'),
                    Dense(512, activation='relu'),
                    Dense(5, activation='softmax')])
model.compile('nadam','categorical_crossentropy', metrics=['accuracy'])

Our model is three 512 relu layers and a softmax output layer. We are using an adaptive optimizer (Nesterov Adam) and a categorical crossentropy loss. Using other optimizers would likely produce similar results, although may have better or worse convergence and/or training speed. Feel free to experiment with using different optimizers, activation functions, layer widths, or number of layers. There may be small improvements to performance that can be made by tweaking these, but probably not much (see tips below for better suggestions on improving the bot).

You probably shouldn't change the loss function unless you know what you are doing. Categorical crossentropy is the "correct" loss function for a classification problem. For Q-learning, policy gradient techniques, and other types of RL, you will need to change the loss function.

Sampling

Because we are outputting one tile's move with each forward pass, a naive sampling of the replays would massively oversample the endgame where there tends to be hundreds of times more tiles than the first few turns. With this naive sampling, your bot would never learn to play the early game properly!

One solution is to use sample weighting to scale the loss gradients. This is what I do with my bots, but it is much more computationally intensive. The starter bot uses a simpler, faster solution that samples the tiles stochastically in inverse proportion to the number of tiles the training target has that turn:

sampling_rate = 1/stacks[:,0].mean(axis=(1,2))[position_indices[0]]
sampling_rate /= sampling_rate.sum()
sample_indices = np.transpose(position_indices)[np.random.choice(np.arange(len(sampling_rate)),2048,p=sampling_rate)]

The ML starter bot also truncates the replays to the first 100 turns.

Training

model.fit(training_input[indices],training_target[indices],validation_split=0.2,
          callbacks=[EarlyStopping()], batch_size=1024, nb_epoch=100)

This puts aside 20% of the data for validation (see val_loss and val_acc during training). Using validation data is a strategy to mitigate overfitting. EarlyStopping stops training after the validation loss stops improving.

Tips for improving your ML bot

  • The easiest way to improve your model is to simply use more data. The starter bot trains on 1.2 million samples. My best bot so far (rank 38 of 864) used something like 30 million samples.
  • Use average pooling to incorporate longer range, lower resolution data into your model (see this description: https://2016.forums.halite.io/t/has-anyone-tried-using-ml-to-copy-others-strategies/557/6?u=brianvanleeuwen )
  • Experiment with different input combinations. @hmate9 uses one channel for tile ownership (+1=my tile, 0=neutral tile, -1=enemy tile) which is probably an improvement.
  • Add ad-hoc corrections to prevent obviously bad moves, e.g. merging two 255 tiles.
  • Mimicking a specific bot may work better than mimicking the winner because the combination of multiple strategies might be worse than the components (or it could be better, difficult to know without testing). Some bots are likely more mimick-able than others.
  • Train separate models for the early and late game. You could also train models based on map type, e.g. based on average production and strength levels or player count and map size.
  • Use reinforcement learning to improve your weights. I think this will be difficult with any architecture that generates one tile's move per forward pass. My RL implementations have been unstable in training except at extremely low learning rates and the benefit has been negligible.
  • Use a conv net architecture to output all your moves in one forward pass. This is what I'm currently working on. I suspect that RL techniques will be much more effective with a bot like this.

Suggestions on debugging python code with halite locally?
#2

Thanks for this tutorial @brianvanleeuwen!

A couple more tips of my own for further improving:

  • A simple way to train on more data, is to do some processing on your existing training data. For example you could flip the map vertically and horizontally and add those setups to your training data too. This means you have 4x as much data to train on.

  • Experiment with Convolution Layers. These are extremely simple to implement with keras and they have the great benefit of being able to extract useful information from a 2D environment (the halite board).


#3

Very cool @brianvanleeuwen! Interestingly I arrived at a similar solution although there are some differences in my approach:

  • I was able to learn bot behavior without the need for sampling. Although I had initially specified sample weights (to put higher weight on early game behavior), I found that by reducing learning batch size, choosing a model with higher bandwidth and randomization of samples, my bot worked reasonably well (even while training on a single replay). Of course, for supervised learning the more data you have the better. I've trained my bot on a reasonably large data set (hence my current rank at 95) but nowhere near the number of samples you've used.
  • I've been able to use Q-learning to learn a bot from scratch while playing against nmalaguti's OverkillBot. Like you I've only had success with low learning rates although I suspect that this is due to a trickier credit assignment problem (multi-agent vs. single-agent) and finding the right reward structure. There is more work to do here but I'm hopeful that with more parameter tuning, better initialization, correct reward structure/shaping and training for lots of epochs a Q-learning bot can do quite well.

Also I think one needs to be careful when using convolutional layers. We need to make sure that we capture the right symmetries inherent in the game. I'm still unclear on what spatial invariances (typically found in image classification that CNNs exploit) apply to halite. It would be interesting to visualize the filters learned by a CNN model.

@hmate9: Regarding processing of training data, my suggestion would be to train a 3-class classifier instead (i.e. NORTH, WEST, STILL). For EAST, the transformed label would be WEST with the input being a mirror image around the vertical axis. Similarly for SOUTH the input would be mirror image along the horizontal axis with a transformed label of NORTH. I'm unsure whether the 4x training data scheme is the right one.


#4

Surely a randomly initialized set of weights wouldn't take a game from OverkillBot in a million rollouts. You must have something other than win/loss built into your rewards?


#5

I was thinking about this as well and I would imagine a better strategy would be to create a score based on some combination of peak strength, production and territory. I would use a positive value for each move alive and then a negative max number of moves if you lose. if you win, I would just add the 2 * max moves - remaining moves. So, on the extremes if you lost in one move on a 30*30, your fitness would be -299. If you won on one move your fitness would be 1 +(299 * 2). This will favor short games in which you win but the 2x is arbitrary.

I always wondered the exact relationship those three factors have on the likelihood of win. Just from watching a lot of games, most times the winning player has a dominant position in all three but this would be interesting to study further. If you could do that and create a new factor to encapsulate positional strength (like strength of pieces near edges), one can evaluate a given game board for each move and then choose from that. However the evaluation of edge strength would be hard to capture without a complex model.


#6

You're right. I am using a one-step reward structure of w1 * delta_production + w2 * delta_strength. I chose not to include territory in the reward function since I felt that the delta_production already captured it.


#7

That is an interesting suggestion. I've been considering different ways to take advantage of the symmetry of the environment. There are actually eight point symmetries, not four (source: my PhD dissertation was on symmetry :slight_smile: ). Additionally, there are symmetries with translation components, but our tile-by-tile approach should already account for this (as would the conv net approach).

Fully incorporating symmetry properly can be complicated. @hmate9's suggestion of data augmentation by rotating and mirroring is a simple but effective way to use the symmetry of Halite for ML, but incorporating the symmetry into the design and architecture would likely achieve better results with less training.

I like @kedarbellare's suggestion too, but it could be even simpler. Instead of training a 3-class classifier, just train a binary classifier (NORTH=1, STILL=0) and merge it with three other copies of itself (four rotated copies of the original input would need to used). This partially accounts for the symmetry and you could use data augmentation for the remaining mirror symmetry component. It just depends on whether you think the added complexity is worth the potential benefits.


#8

Interesting idea about only having two outputs. If I have the time I'll definitely try it out and see how it performs.


#9

@brianvanleeuwen: I tried your tutorial bot, it gets to 80% accuracy however I obtain a static bot, that stays still the whole game. I suspect that 80% of the moves in game are STILL and this is all the network learns.

Did you run into this problem yourself?


#10

When I tested it seemed to work but there is another user that is having this issue as well. It is possible (although seems unlikely) that the random initialization is resulting in different outcomes. I'll take a look at it tonight. I might just change it to np.random.choice on the softmax output rather than argmax. At least then it should always move some tiles.

If someone fixes this before I do, please send a PR.


#11

I added stratified sampling. Please test if this fixes your static bot issue:

Edit: never mind, it didn't work.


#12

Thanks. I tried your edit but I still run into the same problem. However the accuracy is now much lower, peaking at around 40%.
It is worth noticing that the early stopping kicks in very early, after only 5 to 10 iterations, I might try disabling it.

In any case, I can think of several ways to fix this. One way could be to train 2 separate models, one to predict still/move and another one to predict the movement when there is one. Another way could be to add some kind of regularisation or weights to give less importance to "still" directions in the loss function. I'll probably try the second one as it is simpler to implement,

However I am still curious as to why my training gets stuck in this local optimum but not yours.


#13

I'm having some trouble while importing keras in the bot file. The engine sees the outputs "I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally" as invalid and eliminates the bot. Any tips?


#14

@Camsbury:

You could try intercepting stdout the same way the script intercepts stderr. Something along these lines:

with open('os.devnull', 'w') as sys.stderr:
    with open('os.devnull', 'w') as sys.stdout:
        from keras.models import load_model
        model = load_model('../halite-ml-starter-bot/model3.h5')

#15

Thanks, but it is still creating the error...


#16

Are you sure the message is sent when importing the libraries? Or could it be when loading the model?


#17

Yeah it is in the import... I tested again with only the import to check.


#18

Mean accuracy is really not the right metric to be judging the progress by. If your samples are 80% STILL, then a model that always outputs STILL achieves 80% accuracy. Furthermore, there are typically far fewer early game samples, so they have little impact on mean training accuracy but the early game performance is much more important in the actual game.

It turns out that my sampling scheme in the ML starter bot resulted in a lot of duplicate samples, which is very bad for training. I've made some changes that should fix this. I'm testing it now. I've also changed it to only use samples from one bot, e.g. 'erdman v10', which seems to fix the stationary bot issue. Now the training accuracy is around 90% (although, again, accuracy isn't a great metric for the quality of a ML bot).

Thanks, good suggestion. It was probably a bit to aggressive at stopping training. With EarlyStopping(patience=0), the test that I'm currently running would have stopped after three epochs with val_loss=0.5390:

Train on 3195291 samples, validate on 798823 samples
Epoch 1/100
3195291/3195291 [==============================] - 326s - loss: 0.5489 - acc: 0.9140 - val_loss: 0.5388 - val_acc: 0.9132
Epoch 2/100
3195291/3195291 [==============================] - 339s - loss: 0.5414 - acc: 0.9130 - val_loss: 0.5373 - val_acc: 0.9125
Epoch 3/100
3195291/3195291 [==============================] - 339s - loss: 0.5383 - acc: 0.9121 - val_loss: 0.5390 - val_acc: 0.9120
Epoch 4/100
3195291/3195291 [==============================] - 336s - loss: 0.5355 - acc: 0.9101 - val_loss: 0.5327 - val_acc: 0.9114
Epoch 5/100
3195291/3195291 [==============================] - 339s - loss: 0.5323 - acc: 0.9083 - val_loss: 0.5279 - val_acc: 0.9051
Epoch 6/100
3195291/3195291 [==============================] - 339s - loss: 0.5287 - acc: 0.9075 - val_loss: 0.5311 - val_acc: 0.9074
Epoch 7/100
3195291/3195291 [==============================] - 349s - loss: 0.5268 - acc: 0.9061 - val_loss: 0.5281 - val_acc: 0.9074
Epoch 8/100
3195291/3195291 [==============================] - 342s - loss: 0.5229 - acc: 0.9051 - val_loss: 0.5285 - val_acc: 0.9040
Epoch 9/100
3195291/3195291 [==============================] - 344s - loss: 0.5210 - acc: 0.9043 - val_loss: 0.5272 - val_acc: 0.9012
Epoch 10/100
3195291/3195291 [==============================] - 335s - loss: 0.5179 - acc: 0.9030 - val_loss: 0.5256 - val_acc: 0.9070
Epoch 11/100
3195291/3195291 [==============================] - 339s - loss: 0.5145 - acc: 0.9027 - val_loss: 0.5246 - val_acc: 0.9015
Epoch 12/100
3195291/3195291 [==============================] - 346s - loss: 0.5121 - acc: 0.9008 - val_loss: 0.5235 - val_acc: 0.9018
Epoch 13/100
3195291/3195291 [==============================] - 340s - loss: 0.5095 - acc: 0.9001 - val_loss: 0.5220 - val_acc: 0.9004

After 14 epochs, it is down to val_loss=0.5216 and still going down :slight_smile:

I'm bumping up the stopping patience. I'm also adding the ModelCheckpoint callback which will save the model with the lowest val_loss so that we can train for more epochs to see if there is an improvement later on without losing the best copy so far.


#19

For now I'm using another environment with CPU TensorFlow just to run the model after it has been trained, and I'm not getting the error on import. A bit annoying, but I'm glad there is something that works.


#20

The ML starter bot is fixed now. I have it running as my submission right now (brianvanleeuwen v36): https://2016.halite.io/user.php?userID=1669