Do you do any postprocessing of your moves? ML bots sometimes make obviously bad moves that could be corrected with a few prewritten rules. @Sydriax has been a vocal advocate of this.
That would be cheating! That's like going through your kid's homework and fixing all the mistakes.
Edit: just to be clear, I don't actually think there is anything wrong with mixing in some hand-crafted logic into an ML bot. I might add post-processing at some point, but for now I'm interested in seeing how much performance I can get purely through learning.
I'm extremely interested in how you pulled this off, though you may consider that secret sauce. Are you using reinforcement? Do you regress to end-of-game placement, or something else?
I want to see someone build an ML bot better than mine, so I don't mind sharing any details that could help. I would just post all my code, but the creators requested that I don't.
it seems we have a couple of people succesfully building 'copycat' bots learning from experts (and a few of those using RL on their bots).
is anyone working on evaluation methods ? MCTS?
Does anybody achieve higher ranking than @brianvanleeuwen using ML based strategy so far?
I'm interested in playing against other ML bots, not sure how to do that though... still need to look into submitting to the leaderboard.
I'm using policy gradient reinforcement learning with a convolutional neural network to process the environment.
How do the RL practitioners feel about A3C?
I have done next to no RL work but tried it and policy-gradient for a side project. A3C seemed to learn a lot faster.
My problem with RL learning has been that most RL algorithms have only one agent. Here, we have multiple agents that have to be moved simultaneously.
My knowledge of RL stops at only one agent, but don't know how to train for multi-agent.
Does anyone have any good reading regarding multi-agent RL?
Are you using policy gradient? Since we often train on a mini-batch of data anyway, you can effectively have a mini-batch of agents. So for example, my mini-batches have the shape [num_frames*num_agents, num_units_per_agent].
Could you post some starter code along the lines of the alt-python3-halite-starter, or the random bot equivalent?
@hmate9: Just want to point out that you do not have to make it this way. Your algorithm can take as input the whole map and output a list of directions for the whole map (even neutral and enemy tiles).
I tried that, but achieved very bad results with that.
Starting to see some improvement. Turns out that handling the edges is a lot more important than I originally thought. Still some weird quirks (moving back and forth rapidly) and lots of wasted resources in the late game -- perhaps still a little too conservative overall.
Finally beat the starter package bots:
numpy, scipy, scikit-learn, pillow, h5py, tensorflow, and keras are all now on the game servers.
MCTS will probably be unsuccessful due to the sheer size of the search space. It's vaguely possible with sufficient compute when you only have 10^500 possibilities total (see Go) but when there are many times more than that per move, I think it will likely not be terribly effective.
My best guess for how a really successful ML bot would work is it will do one of two things:
1. Online Evolution (see Justesen, Mahlmann, and Togelius, 2016) of move sets coupled with a learned position evaluator (this can be done since the data available is so large: we have millions of positions on AWS to learn from, and the winner from each is known; if you add in reflections and translations you've got many more).
2. Something weird that hasn't been seen before.
The problem is that as far as I know, the vast majority of RL has to do with controlling a single agent, so that's what's been done so far on Halite. Making pieces cooperate is harder.
I've created a ML starter bot:
Thanks for the Online Evolution ref. I'm reading it now!
I've messed a little with learned position evaluation, but so far, simple linear combinations of (strength, position, territory) seem to perform as well as any of my learned evaluators.
As for RL, one could argue that although there are many units to move, all the units are controlled by a single omniscient entity, so rather than a multiagent learning problem, it's simply a very high-dimensional monoagent learning problem... the question becomes how to reduce that dimension to something manageable.
I've had decent success using RL and a linear reward system to train 'tactical' bots - i.e. they locally don't do anything stupid. However my attempts to extend them to 'strategic' bots (feeding each unit strategic features and using deeper architectures) have all failed (training doesn't converge).
If I can free up some time I'll try two ways of making tacticalRL bot more strategic:
- keeping the simple learned tactical network and using hierarchical RL to train a 'strategic' agent (see e.g Barto & Mahadevan 2003)
- training unit clumps rather than training single units (I think this is also what @jstaker7 is doing)
I agree that you could see it as a single high-dimensional monoagent learning problem, but I think you may actually make your life more difficult if you do that, since the outputs need to be very precise and interdependent and there is fairly little room for error (i.e. accidentally combining 255s is a very bad thing to do).
If you're interested in pursuing something along the approach of generating move sets directly from the input board, I'd be intrigued to see if people can do anything with HyperNEAT (Stanley, D'Ambrosia, and Gauci, 2009) -- I think its ability to scale networks without losing accuracy could potentially interesting when the game map can be of many sizes.
Finally, if you're interested in the ML approach from the perspective of just trying to win, and a little cheating is okay (according to @brianvanleeuwen it isn't) you might try ML combined with some pre-and-post moveset processing. For example, there are some incredibly fast heuristics that you can run which will prune the set of moves for each piece from 5 to ~3 and will be right ~99% of the time. Choosing the highest output within a smaller set will improve skill of the bot appreciably, I think. Additionally, after running your algorithm you could then add your own traffic control to prevent strength from being lost to the 255 cap. Heuristic-aided ML could end up being a very capable strategy.
The ML-starter-bot looks great. I'm running it on a lot of games I have saved. I did something similar with MLPClassifier from scikitlearn, but my training always capped out at ~85%. I just ran 50 games of erdman_v10 on your train_bot and got an accuracy of 82%. Training input is 77,824 which may be low, but I'm running a much larger train set of 1k games from another player and I'm stalling out at 15% as well.
I think I need a more meaningful representation of the board. I was also thinking about trying to use a high L1 to ensure that most weights are 0 as it's less realistic that the strategy you're trying to reconstruct considers more than a few neighboring squares. Or perhaps other regularization techniques will help.
Board representation is a flat 363 but that can be changed to any other board representation.