An Introduction to Go, AlphaGo and Quantifying Go Gameplay />

Back to Research & Publication

An Introduction to Go, AlphaGo and Quantifying Go Gameplay



Introduction

Our increasingly data-driven world has sparked a surge of awareness, discussion, and innovation regarding the application of artificial intelligence to all aspects of our daily lives. Google Alphabet’s DeepMind is one of the leading groups in the field, and focused much of its recent efforts on the development of AlphaGo, a super-program designed to beat the game of Go.

AlphaGo’s challenge of Go is in many ways reminiscent of IBM’s Deep Blue’s challenge of the game of chess back 1997 – it represents the newfound potential of artificial intelligence, and stands to drastically shape the way we interact with the world for years to come.

To date, the machine learning methods behind AlphaGo have already been utilized in the sectors of energy efficiency and public healthcare. At Google, DeepMind’s program reduced data-center cooling costs by 40%. Meanwhile, in public healthcare, DeepMind is collaborating with NHS groups to save lives and streamline systems by adjusting diagnosis and risk-assessment procedures (DeepMind).


Context

The Millenia-Old Game

Go, one of the oldest board games in the world, originated some 3000-4000 years ago in Ancient China. Shortly after its invention, it spread by politics and commerce to neighboring countries in the Far East, and has since become an integral aspect of tradition and culture in China, Korea, and Japan (“The Ancient Chinese Game of Go”). Over the recent decades, Go has become increasingly popular in the West: in the US, the game is organized and promoted by American Go Association (AGA); in Europe, the European Go Federation (EGF) facilitates collaboration between organizing bodies in over 30 member-countries (“American Go Association” and “About the European Go Federation”).

Go is a sequential, two-player strategy game played on a 19x19 board of gridlines, using black and white stones. Each stone is played on an intersection, and, unlike chess, has no special trait – in other words, each stone’s value is determined solely by its position on the Go board.

Whereas the winning condition of chess is to capture a specific game-piece, the aim of Go is to secure more “territory” than your opponent by the end of the game, rather than capture your opponent’s pieces. “Territory” is counted by the number of intersections under a player’s control. Because of the size of the board and the open-ended winning condition, Go allows for a plethora of playstyles, tactics, and outcomes – mathematically, the number of possible configurations on a Go board far exceeds the number of atoms in our universe!

The History of an Invincible AI

In March 2016, AlphaGo stunned the Go world by defeating 9-dan professional player Lee Sedol of South Korea 4:1 in a five-match series. At the time, Lee Sedol held 18 international Go titles, and was widely considered one of the most experienced and skillful Go players in the world.

In May 2017, AlphaGo challenged Ke Jie (China), then the top ranked player in the world. AlphaGo won all games of the three-match series, and shortly after retired from competitive play as the new top ranked player. And so, Go’s time as the last unsolved, human-dominated game came to a sudden end; prior to the official matches, many experts had expected human players to retain an edge over AI for several more years.

In addition to being the first Go program to beat a professionally-ranked player, AlphaGo revolutionized attitudes towards Go by playing inventive moves which often contradicted traditional Go theory, but eventually led to favorable positions. According to experts, AlphaGo’s strategy is said to “embodies a spirit of flexibility and open-mindedness”. 9-dan World Champion Zhou Ruiyang noted that AlphaGo taught the world of Go that “no move is impossible” (DeepMind).

AlphaGo’s success serves as a challenge and source of inspiration for us to examine our assumptions and perceptions of knowledge, to think and innovate beyond the scope of what works.

AlphaGo’s Algorithm

Due to the sheer number of possibilities in Go, traditional “brute-force” search trees which run through all legal variations are not only highly inefficient, but also implausible to implement in a time-constrained scenario. Additionally, the open-ended nature of the game makes it difficult for AlphaGo to rely on preset heuristics – “rules of thumbs” or “mental shortcuts” which help the algorithm select higher value moves. Schaeffer takes this a step further and claims that, unlike in chess, there are “no dominant heuristics” in Go (Byford). In hindsight, AlphaGo’s unorthodox gameplay seems to imply that even if dominant heuristics exist, human players have not been able to properly identify and understand them.

To avoid such issues, AlphaGo combines advanced tree searches and deep neural networks in its algorithm. Its Monte Carlo tree search system randomly samples possible moves and analyzes the most promising ones. After comparing the outcomes of selected moves, AlphaGo tweaks “weights” in its network which influence its sampling and move-analysis. By repeating this over the course of many self-played games and referring to records of many human-played games, AlphaGo is able to significantly improve its chances of sampling “good” moves (Byford, Hassabis, and “Monte Carlo Tree Search”).

AlphaGo’s in-game decision-making can be summarized by a “policy” network and a “value” network. The “policy” network predicts the next move to play, reducing the range of initial moves which are evaluated; meanwhile, the “value” network predicts the winning probability of an initial move, so the sub-variations of each selected move do not need to be played out in full (Byford, Hassabis and DeepMind). The move chosen is the one with the highest winning probability. If AlphaGo’s highest winning probability falls below a preset threshold, it automatically forfeits (DeepMind).


Overview of Study

Aim

This study approaches the in-game mechanics of Go from two specific angles: firstly, how can tempo and win-rates – key indicators of game mechanics – be measured and compared? How does this compare across Black, who has a first-mover advantage, and White, who has an initial territory advantage? Secondly, how do such predictions and trends, if any, compare across the three categories of AI vs. AI, AI vs. Player, and Player vs. Player?

Tempo will be quantified using a simple measure of distance. Given the intrinsic difficulty of quantifying Go gameplay, I will also discuss methodological limitations and further exploration.

Hypothesis
Win-Rates

Black’s first-mover advantage is compensated by “komi”, an initial territorial advantage between 5-8 points given to White, depending on the counting method. Since this number is carefully monitored and adjusted by professionals, it is reasonable to assume that it accurately nullifies Black’s advantage. Thus, the “fair” game expected win-rates of Black and White should be 50%.

Gameplay

According to conventional Go theory, it is advantageous to play “lightly” and to have “initiative”. Playing “lightly” is often characterized by playing one move in a certain location on the board, and then playing the next move in an entirely different location. Having “initiative” means having control over the flow of the game; this is characterized by forcing one’s opponent to play reactively, which means responding close to the location of the previous move. We can group playing “lightly” and having “initiative” as important aspects of “tempo”.

Intuitively, if a player has good tempo, they seem to have more control over the outcome of the game. Therefore, tempo would be expected to correlate positively with win-rates, after adjusting for confounding factors like komi. By the nature of the game, first-mover Black is expected to have a higher tempo measure than White. Moreover, if we take tempo to be a measure of the relative responsiveness of the two players to each other’s moves, a relatively higher tempo measure in, say, Black, could correspond to a lower tempo measure in White. Lastly, differences in tempo could reflect playstyle differences between professional human players and artificial intelligence.


Methodology

Data Collection

To allow for a comparison between the three categories of AlphaGo vs. AlphaGo, AlphaGo vs. Professional, and Professional vs. Player (henceforth AG vs. AG, AG vs. Pro, Pro vs. Pro respectively), I collected four datasets from three independent sources. Google DeepMind released 50 self-play games (AG vs. AG) and 60 games played online by AlphaGo against professional players from around the world (AG vs. Pro). The remaining two datasets were scraped from Go game-record databases which are well-referenced amongst the online Go community: Go4Go’s top listed games as of October 12th (50 games); GoKifu’s top listed professional-level games between September 30th and October 11th (100 games). These two data sets form the “Pro vs. Pro” category.

Data Handling

All data was processed and analyzed in R using the package “gogamer”. Graphs were created using the package “ggplot2”.


Data and Analysis

Win-Rates
Number of Wins Win-Rates
Black White Black White
AG vs. AG 12 38 24.0% 76.0%
AG vs. Pro 29 31 48.0% 52.0%
Pro vs. Pro 76 74 50.7% 49.3%

Interestingly, in the category AG vs. AG, White wins a large majority of games. On the other hand, in the categories AG vs. Pro, Pro vs. Pro, the win-rates of Black and White appear more balanced – closer to our expected “fair” win-rates of 50%. Hypothesis testing can be used to check whether these win-rate results are due to chance.

Firstly, for AG vs. AG, we take an expected value (EV) of 0.50 for a “fair” game, and a corresponding standard deviation (SD) of 0.50. Given a known population SD, it is possible to derive standard error (SE) and apply a two-tailed z-test to the data – in this case, we test the observed win-rate for White.

Consider the null hypothesis to be having a “fair” game set-up (i.e. the observed win-rate for White can be explained by chance):

H0: μ = 0.5

The alternative hypothesis claims that the game is “unfair”:

H1: μ ≠ 0.5

Find the average SE over 50 games, the z-test statistic, and then take the corresponding p-value:

SEavg =
50 × √0.5 × 0.5 / 50
≈ 0.070711

z =
0.76 − 0.50 / 0.070711
≈ 3.67696

p ≈ 0.00236 < 0.01

To conclude, the null hypothesis can be rejected at the 1% level of significance. Results are not due to chance; Go in a AG vs. AG situation is “unfair”.

A one-tailed z-test for H1: μ > 0.5 results in p ≈ 0.000118 – we can almost certainly say that for AG vs. AG, there is an advantage to White, the second-move player.

Secondly, we consider the category AG vs. Pro, which seems to better fit our expectations; however, there is a vital confounding factor in the data. In the limited pool of 60 games released by DeepMind, AlphaGo recorded a 100% win-rate against online professionals. Therefore, win-loss records can reasonably be attributed to differences in player skill, rather than differences in playing as Black or White. AlphaGo’s almost perfectly-even split between playing as either color further supports this assumption.

Lastly, we conduct a two-tailed hypothesis test on the win-rate of White for Pro vs. Pro. Using the null hypothesis H0: μ = 0.5, the alternative hypothesis H1: μ ≠ 0.5, and an observed value of μ = 0.493, the z-test outputs a test statistic of z ≈ −0.171464, and p ≈ 0.863859 > 0.05. From these results, we cannot reject the null hypothesis; there is a good chance that Go is a “fair” game.

Overall, two important ideas can be drawn from the results of hypothesis testing:

  1. In the current state of human vs. human professional-level play, the first-mover advantage afforded to Black is reasonably nullified by the size of komi (territorial advantage) given to White.
  2. By AG vs. Pro results, we can assume that AlphaGo’s level of gameplay is superior to professional human players. And since AG vs. AG games are significantly biased to White, the “fair” value of komi in a perfectly played game could reasonably be less than the currently-used values of 6.5 and 7.5. Admittedly, another explanation could be that AlphaGo’s playstyle works well against human players, but this becomes less important if we plausibly assume that there is variation in playstyle amongst professional players.

Measuring Tempo by Euclidean Distance

As mentioned in our gameplay hypothesis, we focus on “tempo” as an indicator of playstyle. Tempo can be further divided into two categories: individual tempo and game tempo.

Individual tempo is defined as the “pace” at which a single player plays, in terms of the location of their move sequence on the board. It is measured by taking the Euclidean distance between two consecutive moves made by the same player. For example, if Black’s first move is at point (16, 16), followed by White’s first move, and then by Black’s second move at (4, 16), Black’s individual tempo between move 1 and 21 would be 12 units.

Game tempo addresses the interaction between both players as they fight for “initiative”. It is defined as the responsiveness of a player’s move to the previous move made. Player 2’s responsiveness to Player 1 is measured by the average Euclidean distance from all consecutive move pairs in the sequence: Player 1, Player 2.


Individual Tempo
Matchup Mean of Average Distance Between Move Pairs Average Difference Win-Rate
Black to Black (BB) White to White (WW) BB – WW Black
AG vs. AG 6.65 6.56 0.09 24.0%
AG vs. Pro 5.54 5.52 0.02 48.0%
Pro vs. Pro 5.53 5.44 0.09 50.7%

From the results above, AG vs. AG games seem to have a higher tempo on average, compared to games in the AG vs. Pro and Pro vs. Pro categories. In terms of playstyle, this implies that when two AIs play each other, they tend to move more rapidly across various locations on the Go board. If we assume that AlphaGo has a superior level of gameplay to human players, this supports the conventional advice that playing “lightly” is indicative of good game sense and sound strategy. In turn, this could possibly improve one’s winning probability.

Notably, Black seems to be playing at a faster individual tempo than White across all three categories, but this difference is small to negligible due to standard deviation values. Regardless, a possible reason for this is Black’s first-move advantage and inherent territorial disadvantage. To reduce his or her initial deficit, Black is pressured to move around the border faster and be the one to create opportunities.

Finally, individual tempo, taken either player-by-player or by comparison, appears to have no direct link to win-rates. Now, setting aside win-rates, we consider the correlation between Black and White tempo.

Plotting average Euclidean distance pairs of Black to Black against White to White for every game across the three categories yields a strong positive correlation, modelled by linear regression.

As shown above, running a linear regression model on each category separately also yields a strong positive correlation between individual tempos.

A possible explanation for this is that both players naturally play towards each other’s tempo. Individual tempo is measured as the Euclidean distance between moves of the same color, but doesn’t account for the fact the second move is also greatly affected by the opponent’s move prior to it. By the nature of Go, each player is forced to respond to each other’s moves to some extent; by playing in similar areas of the board at similar times, both players would record a similar value for individual tempo.

To clarify the important factor of responsiveness, we look into consecutive move pairs of different colors.

Game Tempo and Responsiveness
Matchup Mean of Average Distance Between Move Pairs Responsiveness Ratio Win-Rate
Black to Black (BW) White to White (WB) BW/WB Black
AG vs. AG 5.21 4.99 1.04 24.0%
AG vs. Pro 4.32 4.44 0.97 48.0%
Pro vs. Pro 4.37 4.30 1.02 50.7%

We find the average distance of move pairs Black to White, White to Black for one game, and then take the mean of average distance across all games in the three categories. Intuitively, the average distance between first a Black move and then a White move measures White’s responsiveness to Black’s moves, and vice versa. Also, the higher the raw value of average distance, the less responsive the second player is to the first player’s move.

In addition, we calculate a responsiveness ratio by taking the average distance of BW (responsiveness of White to Black) over the average distance of WB (responsiveness of Black to White). At a neutral value of 1, each player is equally responsive to the other. Values > 1 signal White initiative, since White is less responsiveness to Black moves than Black is to White. Similarly, values < 1 indicate Black initiative.

From the table above, both AI players in AG vs. AG games record a lower average responsiveness to each other than players in AG vs. Pro or Pro vs. Pro games. There appears to be no direct link between average responsiveness and the responsiveness ratio or win-rate across categories.

Graphing the average Euclidean distance of different colored move pairs against each other by game results in a broad spread of points, irrespective of category. There is no apparent clustering or correlation, and running a linear regression as in the previous section yields a very low R2 = 0.234. Further analysis will be needed to determine correlation details and underlying factors, confounding and otherwise.


Conclusion

Summary

In "Win-Rates" in the previous section, the data demonstrates that the current rules of Go are adequate for human vs. human games at the professional level, but results in an “unfair” situation for games in which AlphaGo plays itself. Furthermore, "Measuring Tempo by Euclidean Distance" in the previous section reveals a strong positive correlation between the individual tempo of both players within a game, using Euclidean distance between moves as a measure of tempo.

Overall, the implications are profound: by win-rate analysis, AlphaGo appears to be better at playing Go than professional-level players around the world, and, contrary to convention and intuition, has a higher winning probability as the second-move player. By taking the first assumption as reasonable and combining it with gameplay analysis in terms of individual tempo and game tempo, we can conclude that humans appear to play a style which is too slow, restricted, and responsive compared to the optimal, or, at least, a more effective method of playing. On a more positive note, some general Go conventions appear to be supported by AlphaGo’s playstyle, e.g. the hypothesis that “light” playing is a good strategy. Lastly, comparisons of responsiveness between Black and White and across matchups show no clear trends, and will require more in-depth exploration.

Limitations

Author’s Note: to be updated. Please contact the author directly for more information (12/1/17).

Further Exploration

Author’s Note: to be updated. Please contact the author directly for more information (12/1/17).


Works Cited

“About the European Go Federation.” European Go Federation, European Go Federation (EGF), 2017, www.eurogofed.org/about/.

“American Go Association.” American Go Association, American Go Association (AGA), 2017, www.usgo.org/.

Byford, Sam. “Why Google's Go Win Is Such a Big Deal.” The Verge, Vox Media, 9 Mar. 2016, www.theverge.com/2016/3/9/11185030/google-deepmind-alphago-go-artificial-intelligence-impact.

Chan, Dawn. “The AI That Has Nothing to Learn From Humans.” The Atlantic, Atlantic Media Company, 20 Oct. 2017, www.theatlantic.com/technology/archive/2017/10/alphago-zero-the-ai-that-taught-itself-go/543450/.

DeepMind, DeepMind Technologies Limited, 2017, deepmind.com/.

Hassabis, Demis. “AlphaGo: Using Machine Learning to Master the Ancient Game of Go.” Google, Google, 27 Jan. 2016, blog.google/topics/machine-learning/alphago-machine-learning-game-go/.

Hollosi, Arno, and Morten Pahle. “Number of Possible Go Games.” Sensei's Library, Sensei's Library (SL), 24 Mar. 2016, senseis.xmp.net/?NumberOfPossibleGoGames.

“Monte Carlo Tree Search.” Wikipedia, Wikimedia Foundation, 19 Nov. 2017, en.wikipedia.org/wiki/Monte_Carlo_tree_search.

“The Ancient Chinese Game of Go.” China.org.cn, China Internet Information Center, 7 June 2005, www.china.org.cn/english/features/Archaeology/131298.htm.

Silver, David, et al. “Mastering the Game of Go with Deep Neural Networks and Tree Search.” Nature, 2016, doi:10.1038/nature16961.


Appendix

Data Tables
Number of Wins Win-Rates
Black (C*) Black (R**) White (C) White (R) Black White
AG vs. AG 0 12 0 38 24.00% 76.00
AG vs. Pro 4 25 9 22 48.33% 51.67%
Pro vs. Pro (Go4Go) 6 17 8 19 46.00% 54.00%
Pro vs. Pro (GoKifu) 9 44 13 34 53.00% 47.00%
Total Pro vs. Pro 15 61 21 53 50.67% 49.33%

*C = Wins by counting (game complete)
**R = Wins by resignation (game incomplete)

Win-margins were considered, but the range of available data within the datasets were too limited to be useful. Moreover, win-margins are highly dependent on context. For example, AlphaGo always resigns after its winning probability falls below a preset threshold. Similarly, by end-game, professional players are also certain of the game outcome, and can choose to resign, but may play out the match for personal reasons, rendering win-margin more a arbitrary factor than an indicator of gameplay or player strength.

Author’s Note: individual tempo and game tempo data tables (includes Average Euclidean Distance and SD data) are to be updated. Please contact the author directly for more information (12/1/17).

R Code Summary

Author’s Note: to be updated. Please contact the author directly for more information (12/1/17).

Semester

Fall 2017

Researcher

Isaac Yiu