Halite Home

Has anyone tried using ML to copy other's strategies?


#1

I wrote a script to download hlt files from winning games. I then tried to train a neural net to predict the next move of a particular piece but I have not been too successful. The best I got with a top ranked player was a score in the mid 80s but that's only because most of the moves are STILL. For non-zeros, I think the best I got was around 40%. But when I implement my strategy in actual games, it doesn't seem intelligent which makes me doubt it learned anything useful.

I believe it can be done but only with a meaningful representation of the board. This is what I tried.

  1. Since board sizes vary, I limited to a NxN representation of the board (e.g. 10x10 but I tried other sizes as well as larger sizes that wrap the board around)
  2. I put the owners, productions and strengths side by side so each frame was really Nx3N
  3. I tried predicting the movement of one piece, so I centered the board around that piece
  4. I standardized the owners by using -1 for squares I controlled, 0 for unoccupied squares and 1 for enemy owned squares
  5. I standardized the productions and strengths, respectively (tried minmax and standard scalar)
  6. I used MLPClassifier with various parameters and layer sizes, none really performed significantly better than the rest

Perhaps an RNN would be better suited since many players have a strategy that involves subsequent moves. Anybody try this or have any ideas?


ML/AI Strategies to explore?
#2

Your approach sounds reasonable. I don't think RNN architecture will help.

I use separate channels for owner==me and owner==enemy, but your signed ownership channel should probably work just as well or maybe even better.

Your production and strength standardization might be an issue, depending on the implementation details. My bot just divides strength by 255 and production by 20.

My current network is three dense 512 relu layers followed by an output layer, but as long as your network is reasonably sized, the training procedure is probably more important then the exact network architecture.

How many samples do you train with? Do you validate and use early stopping? What is your sampling scheme?


#3

If you are training on every move on every piece then your training procedure is putting way too much weight on the end game, because there are fewer pieces in the early game. Consequently, your training procedure would result in poor early game performance. It is much more important to make good moves in the early game than in the late game.

I weight my samples by something like 1/(number of pieces owned). Alternatively, you could randomly select one sample per turn to keep the turns evenly weighted overall. I also truncate the replays to avoid training on irrelevant late game samples.


#4

That's good advice to divide strength and production by 255 and 20, respectively. Definitely the logical thing to do.

Good point about the end game. I just tried again limiting my sampling to the first 50 frames of a game. I have a lot of training data as 1 game probably contains at least 100k positions. I was using 100 games with 15 frames per game and 10 positions per frame, although I ran so many times I switched it up and used significantly more. Training didn't seem to improve on larger data sets so I didn't try more.

I was training @djma v3 so it might work better on a different player if they follow a strategy more easily learned by a neural net. I'll try to train on your games as well since you use neural nets. How many adjacent squares do you consider?

You can see how I'm learning in the gist below. I have a folder games/[user]/[hlt zip files] that I read from


#5

Hey, I'm working on something like this as well. Have not gotten great results yet but here are a few thoughts:
- train a deep convolutional network - hopefully this would work better due to translational invariance
- separate the prediction of "which piece to move" from the prediction "which way to move the piece", idea I got from "predicting chess moves using convolutional neural networks" (Oshri and Khandwala)
- since the game map wraps around, it is easy to convert all maps to the same size (50x50) by simply using wraparound padding

will keep you posted!


#6

I train on millions of samples. 15k is probably not enough.

djma v3 is probably a good target.

I use a 35x35 input centered on the tile, but only the nearest 5x5 is input at full resolution. I have 3x3-tile average pooling in a 15x15 and 5x5-tile average pooling in a 35x35. With 4 input channels, this merges and flattens to 396 dimensions.


ML starter bot tutorial
#7

any reason you're centering on a single tile rather than using a "pixel-map" type approach? (except that it works and the pixel map approach doesn't seem to :wink:) ?


#8

to make it actually "local" (independant of your global x,y position)

what you want to learn is: there's a fat block to the east, a smaller to the north, so go north, whatever your x,y coords are


#9

Right, so that the net knows what piece I'm referencing. Because any one pixel map can contain dozens of my pieces that move various directions


#10

It sounds like you used neural nets to devise a strategy, while I'm using it to decipher someone else's strategy. If so, how did you train your neural net? Did you use an evolutionary algorithm to mutate and breed various nets depending on how well they played each other? I'd imagine that takes a long time to train although it may result in some novel strategies


#11

I train it to mimic the winner in replays involving the best bots. Then I apply RL techniques to improve the weights. Unfortunately, the RL techniques appear to quite unstable with my current architecture. To avoid simply randomizing the weights via highly correlated gradient updates, I need to set the learning rate so low that it is difficult to know whether there any progress is even being made :confused:

I think the RL instability is caused by moving each tile independently. I'm planning to build a new bot that generates all my moves in a single forward pass. Hopefully, this will stabilize my RL training step and also make it easier to have cooperation between tiles. Even without RL improvements, I expect that this will reduce the number of terrible moves (like merging two 255 strength pieces...) that my bot makes.


#12

Sounds a lot like the ANN's in Alphago. What flavour of RL are you using?

Thanks for the info! Looking forward to your tutorial.


#13

Policy gradient