Autoplay for PONG

paul1066 · September 30, 2019, 1:55pm

If you were to use a similar autoplay code for a cpu opponent in a game like PONG, how would YOU build in some error so the CPU player doesn’t win every time? or gradually gets better?

topdog · September 30, 2019, 3:36pm

I don’t know where you are in your programming journey, but for the ‘gradually gets better’ task, especially with respect to pong, this is a prime topic that I see covered in introductory AI / Machine Learning work (for Unity).

Eventually, the AI can learn to be as precise as the hand-written algorithm (i.e. never loses) all by itself, due to the constant ‘reinforcement learning’ it undergoes.

I.e. computer AI had a choice of A or B, and thought A was the best choice - naively or not - and then based on whether it was the right answer, the choice of A next time is biased a little more or less accordingly. They call this mechanism “back propagation” where you feed back into the model a small/subtle boost into its available choices, to either reinforce or discourage the use of it in future under given circumstances (inputs).

As this is done over many iterations in simulations (sometimes hundreds of thousands or even millions of times over), it’s also possible to “save” the current level of training-set data at various stages as you go, representing different levels of ability of the computer in that given snapshot of weights/biases in its model.

That would then permit a gradually less-clumsy computer opponent to be assigned to different difficulty levels of gameplay. You would also have the choice to increase/decrease which level of training set data the computer uses for AI over time or for various factors (on the fly), to have it be more dynamic.

E.g. if used for AI in a RTB/RTS game, introducing a clumsy or erratic mistake here or there with a temporarily lowered training set could act as a simulation of the computer opponent being over-confident or desperate, and instead opens potential weaknesses for the players to detect and exploit.

Even if you don’t go full AI/ML in the search for answers to this, you could still hand craft a similar model that uses weights or biases on computational decisions the computer opponent has to make, and so giving it a chance to make the right or (intentionally) wrong decision.

Either way, I think making it try to act like it was an imperfect human you were playing against, you’d have to put more effort into smoothing those decisions out and some simulation of reaction / correction times. Otherwise a super fast computer pong opponent that was imperfect could have the appearance of constantly jittering between right/wrong decisions, instead of just being slow to react or misreading a bounce and going the wrong way.

The interesting stuff with ML and reinforcement learning is just how far it can go - I follow a ML publication by a (famous) ML professor and entrepreneur whose latest article had a fascinating (to me) example where they trained a computer to play “hide and seek” with itself, and how both the hunter and the prey repeatedly one-upped the other in the fight for survival - check it out:

https://info.deeplearning.ai/the-batch-global-surveillance-survey-ais-crisis-of-reproducibility-construction-drones-bots-cheat-at-hide-and-seek

topdog · September 30, 2019, 3:47pm

It’s been a while since I saw this lecture/module so apologies if that general rambling above isn’t that helpful (although I hope it’s at least a bit interesting!).

In the end it boils down to some probabilities and randomization. Make the computer opponent predict where the ball is going to go, rather than just hovering underneath it all the time. Add some margin of error to that prediction. Then add another margin of reaction time to events such as ricochets off blocks/walls. Finally, bound the computer’s movement of the paddle to some amount that’s not unlimited (i.e. per second it can move a max of N units only) instead of just setting X directly to the same value as the ball. Now the computer is going to have to work at it to stay in play, and depending on how you tune the factors above, it will either be better or worse at doing so.

paul1066 · September 30, 2019, 4:01pm

Yeah, thanks for this, good answer! I’m just getting started really. I was thinking along the lines of adding some kind of random error which tightens up as the levels progress, or limiting speed or maybe even PID . Interesting though…

paul1066 · September 30, 2019, 4:06pm