Halite Home

ML starter bot tutorial


Awesome, I'm training your new version now. What was your rationale for switching to Leaky Relu?


@brianvanleeuwen I get a still accuracy of 77% and a move accuracy of 34% which suggests it still has a problem. This probably means that the most important change is to train it on a single bot. I'll try again this way and see if it works.


I'm also working on a Q-learning bot, but I'm struggling with the variable map size. Do you work on the whole map, for every cell or do you mimic @brianvanleeuwen and train each cell with a fixed "vision" range ?


I have been trying a different representation of the board with some success. I use [my_strength, enemy_strength, open_strength, production], where my_strength is strength * owner where owner is myID.

If you have two separate fields for owner and strength, the model is expected to learn the relationship between the strength and the owner to determine the appropriate move. I feel making it explicit in your training set is more efficient and easier to debug that you have the correct input and you're doing the correct transformations. For each position, I create a relative frame where the square for which the model is deciding is in the center

I also trained two separate models, one to determine if a move should take place, and another to determine what the move should be. That seems to work much better.

Since the number of training examples can be very large, creating a generator really helped me manage my training. For the validation set, I took the first N inputs off to the side and had the generator skip them so ensure the model never sees them. The N inputs span a number of games so there isn't bias in terms of validating against just the beginning moves of a game.

I've posted the gist here. For some reason though, it doesn't do as well as when I had saved down the games so I'm still trying to see if there are any bugs. The saved down training set that worked better was able to shuffle the training data, but I feel that my samples per epoch is large enough to offset that.

I was able to create a bot that uses a model to determine if a move should occur and another to determine what the move should be. It has some weird behavior but performs very well in the beginning. But it begins acting very strange towards the end as the bot is commanding more squares. I think its due to the fact that the training set is essentially the same (own all the squares to the left and right of the square being evaluated). Then it is unable to make the move. Still not sure how to handle that but I was thinking about either expanding the visible distance, pooling or having a special case for those circumstances where the square just heads to the closest exit.


That is surprising. When I run the starter bot code I get:

STILL accuracy: 0.925299246221
MOVE accuracy: 0.656212805589

Are you using a different set of replays?

I was concerned that standard relu activation was zeroing out and causing training issues. It turns out that this was not the issue, but I didn't bother changing the activation back. In my experience, the exact choice of hidden layer activation doesn't really affect training performance that much in a reasonably sized network.

I've had issues with standard relu previously, on other projects, when a pre-trained network is exposed to new previously unseen data, it can cause extremely large gradient updates and instabilities that can mess up your weights. Leaky Relu ensures that there is always at least some gradient on every node, so it won't get to a problematic state.


@brianvanleeuwen, I've added a load_model call in the train_bot.py file and I'm curious as to why, given the same set of training examples, the training/validation accuracy appears to stay flat at about 0.400 (by the way, this is using commit ae0aceb8d017da50d500629fdcb7a0233b4b2dab). I would have expected the accuracy to be about the same as it was when the model was saved.

(Top curve (in acc/val_acc) is training from a fresh model, bottom curve is loading the saved model and training)

Otherwise, excellent starter package! Thanks a lot!

Two things I'd love to see added:
- A simple tb = TensorBoard(log_dir='./logs/'+now.strftime('%Y%m%d %H%M')) and tb added to the callbacks of the model.fit call, providing beginners (like me) an easy way to visualize training through tensorboard
- Some comments as to what is done at a given point (tensor creation steps, tensor shape details, etc.)


I tried exactly this representation awhile back based on the same reasoning :slight_smile: . I think it doesn't work well because after a piece moves, the square that it was on has zero strength but it still matters who owns it. Zero strength neutral tiles also pop up when there is combat between players. The difference between a zero strength tile neutral, enemy, or allied tile is important, but they all look the same in the [my_strength, enemy_strength, open_strength, production] representation unfortunately.


I don't know why the saved model accuracy stays flat in your training. If you save the fresh model (purple line in your plot) and re-load it, I would expect that it would still be near 70-80% accuracy. That doesn't really make sense for it to change unless the training set has changed.

If you add the TensorBoard callback and send a PR, I'll merge it in.


@tomzx I ve added the same functionality and observe the same odd behaviour with a loss that stagnates at ~9


I'm training each cell independently (with fixed vision range). The update rule in this case becomes:

  • For terminal case: Q(s_j(t), a_j(t)) ~ (r_t - \sum_{i != j} Q(s_i(t), a_i(t)))
  • For non-terminal case: Q(s_j(t), a_j(t)) ~ (r_t + \sum_i \max_a Q(s_i(t + 1), a) - \sum_{i != j} Q(s_i(t), a_i(t)))

Unlike supervised learning, I've had trouble as well stabilizing Q-learning while training over large number of iterations. I've been playing around with the discount factor, rewards and learning rate and hope to have a good model learned that way.

Has anyone else had success with Q-learning?


Just thought I'd turn that LaTeX into pretty formulas.

I have no idea what it means, but it looks pretty now!


I am using the set you pointed to in the first message.


Sorry, I have no idea what the issue you are having might be. I cloned the repo into a new directory, downloaded the replays, and trained from scratch. The result was very close to my previous test:

STILL accuracy: 0.926704096636
MOVE accuracy: 0.644883366344


@brianvanleeuwen, you've said in a previous reply that you've used pooling:

I use a 35x35 input centered on the tile, but only the nearest 5x5 is input at full resolution. I have 3x3-tile average pooling in a 15x15 and 5x5-tile average pooling in a 35x35. With 4 input channels, this merges and flattens to 396 dimensions.

Do you use pooling at input level or model level? I imagine you could do it on the model level but that would mean your input would have to be a full 35x35 dimension which, on my comp at least, would make managing the training set very annoying. I'm thinking about adding average pooling in the input level. Any experience as to what factors pooling helps out the most? I think you have to have some kind of pooling for a reasonable end-game, especially on a large map.

Here's the pooling function I wrote if anyone wants to use.

def average_pool(stack, pool_size):
    avgs = []
    for r in range(height//pool_size):
        for c in range(width//pool_size):
            r_start = r * pool_size
            c_start = c * pool_size
            avg = np.mean(stack[r_start:r_start+pool_size, c_start:c_start+pool_size])
    return avgs


@brianvanleeuwen: OK I found the problem. I was using python 2.7. Switching to python 3 fixes the accuracy.

There is probably some different behaviour in the numpy functions you use between python 2 and 3, which I naively assumed was not the case. I ought to be more careful when switching between python 2 and 3 from now on.


I've handled this by shifting everything by one, so that a zero tile is represented with a 1. For example:

str_array = np.array(str_array)
str_array -= (str_array == 255) # Avoid overflow
str_array += 1 # Shift everything by one

After which I can have a separate strength channel for each player. Seems to be working well in my case.


Both actually. For training I pool and flatten to 396 dimensional vectors in preprocessing with numpy and train with N x 396 shaped input. I store my training data on disk in this shape and load pieces into memory as needed. For MyBot.py, I add AveragePooling, Flatten, and Merge keras layers to the trained model. I don't recommend this setup, it is pointless to do it in two different ways imo.

Btw, I didn't do any kind of testing to optimize the pooling. I now suspect that the 15x15 with 3x3 pools is not useful when you already have the 35x35 with 5x5 pools. It might be better to ditch the 3x3 pools and expand 5x5 pools to 50x50 to see all of the largest maps.


I'll definitely try that. I can't see any other way to have a rationale end game without hard coded logic but this should help.

My bot is also a bit unstable by moving back and forth in a loop sometimes between two pieces. Not sure if it's just how I trained the not or a bug in the implementation but im not too surprised. A recurrent net would prob prevent that


My bot is currently purely an ML bot, which learned on the moves of the best player and opponents he lost to. At this time its ranked 44. I thought that such choice of the training set would probably help extract the best intuition currently the most, as well as model the best opponents at the same time (including the top player when he loses).

Inside my bot there is quite simple neural net implemented in tensorflow. The input of shape Nx19x19x3 is processed by two convolutional layers followed by two full layers with softmax at the end. There is no pooling after convolutions, as I thought that it would remove too much information in such precise environment.

Unfortunately it shows some serious problems from time to time, like getting stuck at the beginning when nothing really changes within the view range of my cell on large maps. It also frequently merges really strong pieces or dies right before the win because of timeouts. The timeouts are quite a serious problem for me, which mostly stems from the fact that I process each piece independently, what is a serious waste of computation given the convolutional layers. I am not going to address the rest of my issues with some smart post-processing, as I consider this as a pretraining step before I will try to improve my bot with other techniques.


KalraA (rank22) is the highest ranked ML bot that I know of.