Accuracy can be a bit misleading and doesn't necessarily represent the performance of the bot very well. I've had bots that trained at very high accuracy but never move the first tile. Early game accuracy is much more important than late game accuracy and the overall training accuracy depends on the details of how you sample the replays.
Is there any other evaluation function you found to be useful besides simulating games?
I also thought about using a mixture of experts technique to first predict whether a move should be made and then if so, use a separate model to predict the actual move. It's significantly easier to predict whether a move should be made and it would help remove those cases where my model may stay still despite having a very high strength and being far away from an enemy.
I don't think that would be "cheating" and I'm sure there is som systematic methods to develop separate models for prediction and then train a neural network to decide what network to use, but it can be simpler if you just train one model where the target is a move and another where the target is either move or non move
For mimicking, you can split up accuracy into STILL and MOVE accuracy and also early game versus late game accuracy. If you want a single number use mean cross entropy loss on the first 50 or so turns.
For RL, I think you need the metrics to use will depend on the technique and your implementation.
Using multiple networks to split up the decision making is reasonable.