Ed2: Why this is moderately interesting: it does not train by emulating the best human players eg @nmalaguti or @erdman, but rather by playing many thousands of games against versions of itself and thusly slowly levelling up. So it's a different sort of unsupervised learning
[Ed: changed the constants to k in the rebonato form to make it more obvious what's going on, plus numerous minor edits]
First of all, many thanks again to the creators , and for the generosity of fellow competitors, in particular @nmalaguti (your avatar makes me miss my dog), @erdman and @KalraA who was super helpful on the discord. Also thanks to noncompeting friends James Tromans and Diego Colombo for general sanity checks and advice.
The below details a different approach from either a) careful behaviour design or b) all out unsupervised CNN - based learning. My bot was briefly #49 this am (but pogoes around a lot, my hypothesis on this below). I've had great fun trying out all sorts of languages and techniques on this competition (none of which i had really messed around with before) and this post is i think genuinely different. Lots of concrete info in here, but not so much that it spoils your fun. In sum, i grew my bots via evolution and policy gradients. Some of the things I did:
a) Speeding up halite.exe
The very first thing i did was recompile the halite source for my cpu using gcc -O3 -march=native . My machine has 2 old xeons (2 x 10 cores). This step made halite.exe run approx 3x faster than the ready-to-run binary off the site. YMMV but i highly recommend doing this step, its basically free and cant hurt. I also used cluster js to run the new fast halite.exe in parallel, this was about 4x faster than the python parallel scheduler (no idea why) . So in sum, old halite + parallel python is 12x slower than recompiled halite + nodejs + clusterjs (for me)
b) Rewriting the simulation engine Inevitably, i ended up doing this.
We want a fast function F: (owners,strengths,productions,moves) -> (new_owners,new_strengths)
I first started with @nmalaguti 's excellent process_frame function from his visualiser. I eventually rewrote the processor from scratch in matlab (hey, i like it) . the underlying numerics use C, the process functions can all be recast as BLAS operations, and eventually i had something that could simulate approx 20,000 frames a second (that is, assuming you know the moves, can do about 100 games of 200 frames each per second. Bottom line : rewrite the process_frame function if you havent already. You can then directly use it in your simulator and disintermediate halite.exe entirely. BTW i design my bots in matlab, disclaimer- i dont work for matlab but i do like it, it has a pleasant IDE and makes parallelisation easy.
c) Functional specification of bot behaviour
The remarkable overkillbot shows how complex behavior can arise from simple blocks.
I chose the following representation:
Our emphasis on Area, Production, Damage, Strength should vary through time. define 4 functions A(t) P(t) D(t) and S(t) where each function has a rebonato form f(t) = (k1+k2*t) exp(-k3*t) + k4 (4 parameters). This function is great- its very fast to calculate (certainly faster than complex behaviors like shortest path) , is differentiable and covers all sorts of real-life behaviours: constant, linear, rise-then-fall, saturatiing etc etc.
So that's just 16 parameters in all. Amazingly parsimonious .
Now, at every turn make your square do the move that maximizes (A(t) + P(t) + D(t) + S(t)) Note we are not parametrising the behaviour directly (unlike say a CNN that converts game pixels into a move in R^5) we are parametrising the objective function .
Now Simulate a few thousand games and see how you did. Tweak the parameters, this alters your functions A,P,D,S, Simulate again till you have evolved a good bot. If you have a few, try averaging their parameters . You can grow bots for small arenas, bots for crowded arenas etc etc. Essentially policy gradient on the 16 params of the functions makes it cheap to grow a huge family of bots.
Remarkably, this simple local behaviour alone is good enough to get to around rank 80 (as of 5 jan!) reliably. My briefly-49 bot has a few more checks and balances.
The first time i ran this (took a 2 hours to fully simulate a few thousand games with 6 bots), I was absolutely astonished by the complex behaviours that emerged. But equally, the bot did some very silly things. Even looking at the history of arjunvis14 you will see some really cool pincer movements, situations where the bot encloses but doesnt eat squares , to "keep them in reserve" , and points where the bot takes cover behind outcrops to let others fight it out. You will also see some god-awful silliness (anthropomorphising) where it does very random and clearly suboptimal noodling around aimlessly. Hence the rank moves around- it will occasionally beat 4 very "strong" players, but then lose to 2 "weak" ones .
There is a temptation to specify complex behaviours (sneak to the highest production at start) and let the algo work out when to use them, but I find if the behaviours are too complex there is a severe speed penalty. If you go from being able to run 5000 games an hour to 500, that cuts down the amount of parameter exploration you can do. There's lots of fun here to see just how complex the behaviours can get. But basically, the juice in this method is in being able to simulate tons and tons of games. And on a personal level, i like the idea of lowlevel simple behaviours rather than very complex behaviours.
The quality and motivation of competitors is so high, i fully expect a rank 50 today to be a rank 200 in 4 weeks (cf Brian's ML bot which was top 10 once) . I hope the above has been useful in that it details a very different approach. There is enough concrete info to get you started but no spoilers to take the fun out of it. Thanks everyone and good luck !