@Kazamir on Connect 4 AI · about 7 hours ago

47m 37s logged

Added easy, medium and hard mode - each being UCBs but with different runtimes. Also updated the buttons as they were outdated code

Open comments for this post

@Kazamir on Connect 4 AI · about 14 hours ago

1h 55m 58s logged

UCB - Upper Confidence Bound

The UCB algorithm is an algorithm that I created to play connect 4.

How it works

The philosphy of UCB is doing random simulations of the game, and the move with the highest winrate in the random simulations is the best move.

The problem UCB solves is given a set amount of simulations - called rollouts, how do you allocate the amount of rollouts each move gets? You want to spend more time on promising moves, and less on unpromising moves, but you still need to try these unpromising moves because they may be good, but have low winrates just due to random chance. This is decided by the UCB rule:

UCBᵢ = avg(Xᵢ) + √( 2 ln(T) / nᵢ )

Which is the average winrate of the move i plus an optimism score. The more times you play a move, the chance that it is secretely a good move decreases. This ensures that you spend more time on good moves (high average) but also spend some times on worse moves that may turn out to be better by investing more rollouts on them.

Implementation

I implemented this into my connect 4 game I made, and allocated rollouts to 1000, so it makes 1000 random simulations to choose each move, and it kept on beating me. I had to reduce rollouts to 250 so I had a chance of winning.