For this tutorial and the ML starter bot, we will use python 3.5 and keras with a tensorflow backend, but it should be relatively straightforward to implement this in some other deep learning framework.
Download the ML starter bot from:
After you've installed numpy, tensorflow, and keras, download @hmate9's replay cache, unzip, and begin training:
wget https://dl.dropboxusercontent.com/s/uq8peg023a5vnkj/replays.zip?dl=0 unzip replays.zip?dl=0 python train_bot.py ./replays
train_bot.py creates the ML model, loads and processes each replay into the format expected by the model, trains the model, and saves it as model.h5. To simplify the problem of making moves, we will make each tile's move independently. As input, the model will take input from up to four squares away in each direction for a total of a 9x9 view centered on the tile. For this 9x9, we will have four channels: whether we own the tile (0 or 1), whether an enemy owns the tile (0 or 1), the production (divided by 20), and the strength (divided by 255). The scaling factors are intended to keep our input in a similar range, which tends to speed up training.
Defining the model
VISIBLE_DISTANCE = 4 input_dim=4*(2*VISIBLE_DISTANCE+1)*(2*VISIBLE_DISTANCE+1) model = Sequential([Dense(512, activation='relu', input_dim=input_dim), Dense(512, activation='relu'), Dense(512, activation='relu'), Dense(5, activation='softmax')]) model.compile('nadam','categorical_crossentropy', metrics=['accuracy'])
Our model is three 512 relu layers and a softmax output layer. We are using an adaptive optimizer (Nesterov Adam) and a categorical crossentropy loss. Using other optimizers would likely produce similar results, although may have better or worse convergence and/or training speed. Feel free to experiment with using different optimizers, activation functions, layer widths, or number of layers. There may be small improvements to performance that can be made by tweaking these, but probably not much (see tips below for better suggestions on improving the bot).
You probably shouldn't change the loss function unless you know what you are doing. Categorical crossentropy is the "correct" loss function for a classification problem. For Q-learning, policy gradient techniques, and other types of RL, you will need to change the loss function.
Because we are outputting one tile's move with each forward pass, a naive sampling of the replays would massively oversample the endgame where there tends to be hundreds of times more tiles than the first few turns. With this naive sampling, your bot would never learn to play the early game properly!
One solution is to use sample weighting to scale the loss gradients. This is what I do with my bots, but it is much more computationally intensive. The starter bot uses a simpler, faster solution that samples the tiles stochastically in inverse proportion to the number of tiles the training target has that turn:
sampling_rate = 1/stacks[:,0].mean(axis=(1,2))[position_indices] sampling_rate /= sampling_rate.sum() sample_indices = np.transpose(position_indices)[np.random.choice(np.arange(len(sampling_rate)),2048,p=sampling_rate)]
The ML starter bot also truncates the replays to the first 100 turns.
model.fit(training_input[indices],training_target[indices],validation_split=0.2, callbacks=[EarlyStopping()], batch_size=1024, nb_epoch=100)
This puts aside 20% of the data for validation (see val_loss and val_acc during training). Using validation data is a strategy to mitigate overfitting. EarlyStopping stops training after the validation loss stops improving.
Tips for improving your ML bot
- The easiest way to improve your model is to simply use more data. The starter bot trains on 1.2 million samples. My best bot so far (rank 38 of 864) used something like 30 million samples.
- Use average pooling to incorporate longer range, lower resolution data into your model (see this description: https://2016.forums.halite.io/t/has-anyone-tried-using-ml-to-copy-others-strategies/557/6?u=brianvanleeuwen )
- Experiment with different input combinations. @hmate9 uses one channel for tile ownership (+1=my tile, 0=neutral tile, -1=enemy tile) which is probably an improvement.
- Add ad-hoc corrections to prevent obviously bad moves, e.g. merging two 255 tiles.
- Mimicking a specific bot may work better than mimicking the winner because the combination of multiple strategies might be worse than the components (or it could be better, difficult to know without testing). Some bots are likely more mimick-able than others.
- Train separate models for the early and late game. You could also train models based on map type, e.g. based on average production and strength levels or player count and map size.
- Use reinforcement learning to improve your weights. I think this will be difficult with any architecture that generates one tile's move per forward pass. My RL implementations have been unstable in training except at extremely low learning rates and the benefit has been negligible.
- Use a conv net architecture to output all your moves in one forward pass. This is what I'm currently working on. I suspect that RL techniques will be much more effective with a bot like this.