Board Games and AI “Why Alpha Go Could Beat Humans” Reading Notes

Machine Learning Artificial Intelligence Natural Language Processing Artificial Intelligence Algorithm Artificial Life and Agent Reasoning Technology Knowledge Information ICT Technology Digital Transformation Automata/State Transitions/Automatic Programming Navigation of this blog
Introduction

AlphaGo, a computer Go program developed by Google DeepMind, became the first computer Go program to defeat a human professional Go player in an even match (no handicap) in October 2015. The victory of an artificial intelligence in Go, a field that had been considered the most difficult for computers to beat humans, shocked the world, and the appearance of AlphaGo went beyond a simple victory or defeat in a single game to widely publicize the usefulness of artificial intelligence, triggering a global AI boom. It also triggered a global AI boom. In this issue, we will discuss the relationship between board games, including Go, and AI. We will also include reading notes for the book “Why Alpha Go Beat Humans” and “The Strongest Go AI: Alpha Go Kaitai Shinsho: Its Mechanism from the Perspective of Deep Learning, Monte Carlo Trees, and Reinforcement Learning“.

Board Games

Board games are a type of game played with a special game board and game pieces (pieces, cards, dice, etc.) These games can be strategic or recreational activities in which multiple players or teams compete according to rules. The relationship between games and artificial intelligence (AI) is very deep, and games have long played an important role in the research, development, and application of AI. Board games are very diverse and exist based on different themes and rules, as shown below.

1. strategy games:

Games in which strategic thinking is important, such as chess, Go, and Shogi. The objective is to defeat the opponent by skillfully placing the pieces. 2.

2. family games: 

Games such as Settlers of Catan, Carcassonne, Ticket to Ride, etc., suitable for family and friends. The rules are relatively simple and combine elements of strategy and luck.

3. worker placement games:

Games in which players place pieces called “workers,” such as Caverna and Agricola, to collect resources and achieve goals. Resource management and strategy are important.

4. card games:

Games played with cards, such as playing cards, poker, bridge, and Hearthstone. Trick-taking games, card games, card battle games, etc. are examples. 5. word games: games in which the player plays with cards, such as poker, bridge, hearthstone, etc.

5. word games:

Games that involve creating or guessing words, such as Scrabble, Boggle, and Codenames. Vocabulary and word combinations are tested.

6. abstract games:

games without a theme and based on abstract rules, such as Othello, High Throw, Abstract Tactics, etc. Tactical thinking and pattern recognition are important.

7. cooperative games:

Games in which players cooperate to achieve a common goal, such as Pandemic, Globe Trotters, and Gloomhaven. Cooperation, not competition, is required.

8. horror games:

Games that incorporate elements of horror, such as Bettertown, Mansion of Madness, etc. Players take on the role of confronting a horror scenario.

9. economic games:

Games that emphasize economic elements and resource management, such as Monopoly, Acquire, and Power Grid. Players increase their assets and defeat their competitors.

The “Board Game World” section offers a wide variety of board games, including these and many more.

Board Games and AI

The relationship between games and AI is complementary and has contributed to the development of both fields. Games have become a very important element in computer science and artificial intelligence research, and the evolution of AI has brought new possibilities to game technology.

Below we discuss some important points about the relationship between games and AI.

Applications of AI in games:

As described in “History of Digital Game AI (1) (Intelligent Human-Machine Interaction)” and other articles, AI is used to control the behavior of characters and enemy characters in games, enabling AI characters as players and opponents to behave in a more realistic manner. This makes it possible for AI characters to behave more realistically as players and opponents.

AI Training and Reinforcement Learning:

Gaming environments are widely used as platforms for training and evaluating AI models. In particular, as described in “Theories and Algorithms of Various Reinforcement Learning Techniques and Their Python Implementations” Reinforcement Learning allows AI agents to learn how to maximize rewards in games, with applications ranging from self-driving car control to robotics, for example. This has a wide range of applications, from self-driving car control to robotics, for example.

Strategic Algorithms in Practice:

Various strategic algorithms, such as those described in “Overview of Game Theory and Examples of Integration and Implementation with AI Technology” and “Decision Theory and Mathematical Decision Techniques,” are often evaluated using games. For example, AI has been able to compete with very high level players in strategic board games such as chess and Go, and there are examples of AI winning against world-class top players such as Deep Blue (chess) and AlphaGo (Go).

Game Engines and AI Development:

Game engines, such as those described in “Fundamentals of Digital Game AI (Spatial Axis Awareness Technology)” are used in game development not only for graphics and physics simulation, but also for AI implementation. The engines control the behavior and reactions of AI agents, enhancing the realism of the game.

Entertainment and Education:

In addition to enhancing the entertainment experience through games, AI is also being used in education, as discussed in “AI and Education. Game-based learning environments can help students and players develop problem-solving skills and strategic thinking.

AI Advances and Game Evolution:

Advances in AI are also impacting the evolution of games. More realistic characters, interactions, and storytelling are becoming possible, improving the game experience.

Serving as a test bed:

Games are also being used as testbeds for AI algorithms and models, allowing AI researchers to test and improve their approaches to various problems in games. Board games, as an example, offer many challenges for AI to address, such as strategic decision making, aspect evaluation, and pattern recognition.

Algorithms used in board game AI

Algorithms used in board game AI vary according to the type and difficulty of the game, but there are several common algorithms. These are mainly search algorithms, as described in “Overview of Search Algorithms and Various Algorithms and Implementations. The main board game AI algorithms are described below.

Minimax Search:

Minimax search is an algorithm that searches the game tree (a tree structure that contains all possible game phases) to find the optimal move for a player. Minimax will select the most advantageous move in the opponent’s turn and the least disadvantageous move against it, and will combine this with alpha-beta pruning to make the search more efficient. See also “Online Stochastic Optimization, AdaGrad, and Minimax Optimization for Machine Learning” and “Decision Theory and Mathematical Decision Techniques” for more information.

alpha-beta pruning:.

Alpha-beta pruning is a technique to improve the efficiency of minimax search by eliminating unnecessary search. The algorithm limits the range of best moves and does not evaluate other moves. For more information, see “Alpha Beta Pruning Overview and Algorithm and Implementations.

Monte Carlo Tree Search (MCTS).

Monte Carlo Tree Search is an algorithm that performs a probabilistic search to find the best move by performing a number of playouts (playing random moves and evaluating the outcome of the game) MCTS is also used in complex games such as Go and Shogi. For details, see “Applications of the Bandit Method (1) Monte Carlo Tree Search.

Reinforcement Learning:

Reinforcement learning is an algorithm that learns from the experience of game play to obtain the optimal strategy. AI using deep reinforcement learning (Deep Reinforcement Learning) can learn advanced strategies and beat professional players, such as AlphaGo and AlphaZero. For details, see “Overview of Reinforcement Learning Techniques and Various Implementations.

evaluation function:

Board game AI uses an evaluation function for phase evaluation. The evaluation function evaluates whether the current phase is favorable or unfavorable and is used in minimax search and MCTS. The quality of the evaluation function has a significant impact on the strength of the AI.

opening book:

The opening book is a database that contains the AI’s prior knowledge of a particular opening procedure or early phase. This leads the AI to favorable aspects from the beginning of the game.

The board game AI combines these algorithms and selects the best approach according to the rules and tactics of the game. Depending on the complexity of the game and the size of the search space, different algorithms may be used, and AI enhancements and learning allow for the development of more advanced tactics and strategies.

Next, we discuss the differences in AI in the oldest and most popular games, Othello, Shogi, Go, and other strategy-based games.

Differences between Othello, Shogi, and Go from an AI perspective

There are several differences between Othello, Shogi, and Go from an AI perspective. The following is a description of AI efforts and differences in each game. 1.

1. Othello (Reversi):

Othello is a game with relatively simple rules, played on an 8×8 board, where the AI evaluates possible moves and learns strategies to optimize the placement of stones on the board.
Othello AI uses classical game tree search algorithms such as the minimax method and alpha-beta pruning. It also uses a heuristic evaluation function to evaluate the board.
AI reinforcement learning has also been applied to Othello, where deep reinforcement learning models (e.g. AlphaZero) have successfully played Othello.

2. Shogi:

Shogi is a very complex game, with more than 40 pieces on the board and a wide variety of movement rules. This makes the development of a Shogi AI very difficult.
Shogi AI uses game tree search, alpha-beta pruning, heuristic evaluation functions, etc. The evaluation of a game phase involves many factors (piece placement, offensive and defensive factors, king safety, etc.).
There are many open source projects for Shogi AI, and some AIs are strong enough to beat professional players.

3. go:

Go is a much more complex game than Shogi, where the objective is to place stones on the board. The Go board is large (19×19) and the variations of the game are enormous.
Deep learning has played a major role in the development of the Go AI; AlphaGo and AlphaZero represent the evolution of the Go AI.
Go AI uses convolutional neural networks (CNNs) and reinforcement learning (especially Monte Carlo tree search) to evaluate the game phase and select the best moves. The Go AI is strong enough to beat top human professionals.

As Othello, Shogi, and Go progress, the number of strategy combinations becomes vast. The basis of a strategic game AI will be a search algorithm, but while the Othello AI uses classical game tree search algorithms such as the minimax method and alpha-beta pruning, and uses a heuristic evaluation function to evaluate the game, the Shogi AI focuses on game tree search and heuristic evaluation focus, while Go AI, due to its many variations, is predominantly a deep learning approach.

Why did Alpha Go beat humans? Reading notes

This book is a commentary by Yasumi Saito, Professor of Information Industry Organization, Kyoto University. He specializes in population intelligence, cognitive science, the Internet and security, and has co-translated “Metamagic Games” (D.R. Hofstadter, co-translator, Hakuyosha, 2005) and “Literary Machines: Theory of Hypertext,” and is the author of “Ubiquitous Office Technology” (Denki Tsushin Kyokai, 2005). Telecommunications Association, 2005).

His reading notes are as follows.

Why Alpha Go Beat Humans Yasumi Saito
	Prologue: The Impact of Alpha Go
		Events in early March
		Reaction of the World
		Reaction of Professional Go Players
		Reaction of Go Programmers
		My stance (what I write in this book)
	Chapter 1 AI and Games
		1.1 What Has AI Aimed to Achieve?
			Overview
				The term AI was coined at the Dartmouth Conference of 1956.
				In this conference, there is a paper titled "Synthesis of Automata Design for an intelligence amplifier
				IA (Inteligence Amplifier): something that augments intelligence
				Computers of the past were also IA in the sense that they performed calculations on behalf of others.
				The Turing model of the principle of computers is also an abstraction of computation.
				AI is an attempt to make computers behave intelligently like humans
				If AI can be realized, it will become IA.
			Population intelligence and computers that help people
				Eliza
					Simulates psychiatrist's conversation with patient
				SHRDLU (SHRDLU)
					Questions and actions in the world of building blocks
				Dendral (DENDRAL) and Mycin (MYCIN)
					Early expert systems
				AM(Automated Mathematician)
					Suck or Kusha simulates the process of forming mathematical concepts
				Cyc(Cyc)
					Knowledge-based databasing
			It was not known at the time of winning the case how humans, the target of imitation, have performed intelligent behavior
		1.2 Strong and Weak AI
			Two meanings of "same
				They are different when broken down in detail, but the same at the level of function (or the level of connection of components to realize the function, or the level of how to proceed with processing (algorithm))
		1.3 History of AI
			From the optimistic first boom to the knowledge engineering second boom
			The third boom of deep learning
		1.4 Game Programming
			Chess pioneered game programming
			The all-encompassing "game tree
			How to determine "good moves"?
			Why did AI research progress in Go and Shogi?
			Robotics and language understanding far beyond game programming
		1.5 Alpha Go is not AI?
			The AI Effect
	Chapter 2: Changes in Go Programming
		2.1 History of Computer Go Research
			The first program with a strength of 38 kyu
				Recognition of the game by physical analogy by Albert Zobrist
					Program by Algol
				Program by Reitman and Wilcox (1970s)
					Reasoning based on pattern recognition
					Data structure called WEB represents hierarchical patterns: strings and groups of stones of the same color
				Strategies are determined by the patterns.
			MFG, an example of a classic Go program
				DavidFotland's MFG (Many Faces of Go) in the 1980s
					Combination of alpha-beta search, rule-based expert systems, and pattern matching
			Go program using neural nets
				1990s, Herbert Enderton's Golem
				Generates candidate moves using simple rules based on recognition of game situation
				Use neural nets to narrow down candidate moves
			How to train neural nets
				One input is given, the output is computed, and the error against the teacher signal is calculated.
				After that, considering that the error is caused by the weight of the combination with the inhibition of the previous layer, modify the weights so that the error obtained becomes smaller.
				In the same way, propagate the weights to the previous layer one by one.
			Another program using neural nets
				TD learning method 'Temporal Difference Learning' by Nicol Scharudoph et al.)
					Reinforcement Learning
			Go with physics?
				Bernd Brugmamn's Gobble
					Primitive Monte Carlo Go
					Using 'Annealing Method' with thousands of random games
		2.2 Using Sanmoku-tsuname as an example
			Rules for Samoku-nai
				Rules for Tic-Tac-Logic moves (example)
		2.3 Rule-based Programs
			Intelligence can be achieved by manipulating symbols
				Simon and Newell's "Physical Symbol System Hypothesis
		2.4 Make the system stronger by making it learn patterns
			Pattern Matching
			Pattern matching to identify candidate points for "Katatsugi
		2.5 Program by tree search
			Size of the game tree
			Checker game already "solved
			Evaluation of a phase that is desperately needed
			Minimax method to find the best move
			Alpha-beta branch pruning with increased efficiency
			Tree search is not suitable for later
	Chapter 3: New Trends in Go Programming
		3.1 Monte Carlo Go without Go knowledge
			Playing Go with Optimization Techniques
				Brueckmann's Approach to Monte Carlo Go
				Applying physics methods from the perspective of "how would nature play Go?
				Annealing, an optimization technique, is also used to find the shortest path in a graph.
				Random transitions from one state to another nearby state
				Initially, the state transitions toward a larger value with a certain probability (with time, the probability becomes smaller)
			Monte Carlo Go's epochal nature
				Previous Go programs do not search for trees
					Rules determine next move
				Uses very little Go knowledge
				Play completely randomly to the end, and play the move with the highest average of many such plays
		3.2 Monte Carlo tree search with improved efficiency
			Phase evaluation by averaging the results of random plays
				Pusey, Helmstek, and Kaznav's approach
					How to reduce playout (number of random plays to the end)
					Trimming branches of playout
					Annealing is not always necessary
				Fusion of Monte Carlo Go with traditional tree search and knowledge base
			Monte Carlo tree search
				Paper by Coulom (Remi Coulom) in 2006
				Monte Carlo tree search is a clever combination of Monte Carlo and game tree search
		3.3 Emergence of Deep Learning (Deep Learning)
			How Neural Networks Work
			What is Deep Learning?
				Autoencoding (autoencorder)
					It is now written that it is seldom used in practical applications. Autoencoder has only a purpose of use in image denoising and visualization.
					In fact, deep learning algorithms have since been improved to the point where random values can be employed without prior learning and still provide sufficient accuracy
					Autoencoder research has progressed, and it has begun to be used in generative models.
					Variational Autoencoder is probabilistic, unlike AutoEncoder, which takes an input and deterministically determines an output.
					Hinton's Idea
			Improved learning efficiency
			Deep Convolutional Neural Networks
				Convolutional.
					An operation that outputs a numerical value that summarizes a partial region of the original data to the next layer while shifting the region little by little.
				Convolution makes the original image more resistant to misalignment.
	Chapter 4. AlphaGo Structure
		4.1 Structure of AlphaGo
			Policy network and value network
				Two networks (models) are learned in advance
				Policy network
					Where the next move is played most often
					Promising candidates
				Value network
					What is the evaluation value of the game
					Final win rate
				rollout policy (network)
			Value network to improve accuracy
				Extract promising candidate moves in the policy network
				Select high value candidates and start playout
				The value network reduces the calculation
		4.2 Two Neural Networks and Their Learning
			How much has AlphaGo "learned"?
				160,000 games, 24,900,000 games used as training data
				Prepared 28 million games of training data with each game and next move as a pair
				1 million games are left for testing.
				Use machine learning software called "distbelief" for training
					Become Tensorflow
				Policy network obtained by supervised learning
					SL (supervised learning) policy network
				Competes against previous generation policy networks
					RL (Reinforcement Learning) policy network
					1.28 million games per day on 50 GPUs
				Randomly extract a large number of game records obtained by playing against SL policy network data
			Structure of each network
		4.3 Machine power is also important
			Learning speed is 500 times faster than CPU
		4.4 Characteristics of the Alpha Go program
			Alpha Go has no construction strategy or goals
				It uses the value UCB to perform a slightly broader search called "exploration
				What is going on inside is just number crunching
			Alpha Go does not "think".
		4.5 Differences between Go players and the Alpha Go program
			AlphaGo "just figures out the next move" which is promising
			How Alpha Go thinks
			How Alpha Go learns
			Flexibility, versatility
				Need to relearn when board size changes
		4.6 Why did Alpha Go beat Lee Sedol 9-dan?
			Three Causes of Victory
				Combination of Deep Convolutional Neural Networks and Monte Carlo Tree Search
					Deep Convolutional Neural Networks have a weakness: they sometimes lose the global picture of the board
				Unparalleled Machine Power
					Computing power of 3,000 CPUs
				Goodness of the system of two neural nets
					Increasing the number of intermediate layers up to 12
			Monte Carlo tree search to supplement neural nets
			Computer power equivalent to 3000 CPUs
			Can play very strong Go just by predicting the next move Translated with www.DeepL.com/Translator (free version)
Chapter 5 New AI Possibilities
		5.1 Limitations of Human Programming
			Information Processing in the Human Brain Still Unknown
				We do not know the details of how the human brain processes information.
			Machines can only do what they are told to do
				Difficulty of Programming
				Programming means "programming" in advance
					In order to "program" in advance, it is necessary to predict what will happen in the future.
				Neural nets are a method of skipping a part of programming
		5.2 Constraints on Learning Programs
			Clarification of the Problem
				Requirements to which machine learning can be applied
					Input and output definitions must be clear and unambiguous
					The act of "understanding" is difficult to define
						The end product of the process of understanding (output, output, state change in the brain) cannot be defined
			Difficulty in collecting training data
				Difficulty in collecting a large amount of training data
			Challenges in Language Processing with Neural Networks
				There are some methods at the word level, but the problem is how to code sentences and put them into the neural network.
		5.3 Hybrid AI of Classical Programs and Neural Networks
			New Possibilities
				AI started out with the idea of symbolism (Simon and Newell's physical symbolic system hypothesis)
					Trying to realize it by devising symbolic processing programs on computers
				Vinograd created Schlöderloo to perform actions in a world of building blocks, but found it limited
					Shifted to research on human-machine interaction
					Create a system where humans and machines can coexist well and be efficient as a whole
					IA (Inteligence Amplifier)
		5.4 Symbolizers that create symbols
			What is the "physicality" of AI?
				Symbolizer
				SGP(Symbol Grounding Problem)
					Harnad
					To answer the question, "How does information from the outside world that enters the sensory organs, including touch, smell, taste, hearing, and sight, get integrated into a symbol inside a usable object?
				Symbolizers are also related to "corporeality".
		5.6 Applications to Robotics and Language Understanding
			Neural nets (deep learning) are promising areas
				Robots
					State-based approach of unsupervised learning of appropriate concepts and representations by a self-learner (auto encoder)
				Automatic driving
				Natural language understanding
					University of Tokyo Robotics Project
					Pat Hayes' formulation of a mechanism for understanding general human physical laws
						Naive Physics Manifesto
					How to incorporate a symbolizer-like processing mechanism and how to link it to language understanding
		Factors in the Failure of the Fifth Generation Project
			Ending up with vague results without setting up an objectively evaluable evaluation axis
		5.6 Future Go Programs
			No matter how strong it is, Alpha Go is not a "strong AI
				Elements lacking in Alpha Go
					Goal-oriented
					Strategy and tactics
					Awareness of overall game flow (time axis)
					Models of opponent players
					Awareness of the beginning, middle, and endgame
			Go programs to aim for in the future
				Programs that can explain their moves
				Programs that learn like a child
	Chapter 6: Impact of AI on Human Society
		6.1 AI in the spotlight
			The Third AI Boom
		6.2 Arguments that AI will surpass humans are unfounded
			Robots will destroy humans
		6.3 Areas where AI overcomes humans
			AI and robots are good at simple tasks
			Robots are not good at cooperative work with humans
		6.4 Areas where AI cannot overcome humans
			Creative work cannot be replaced by earth
			Machines are not good at interpersonal services
			Language understanding is the final bottleneck
			Beyond Pessimism
	Afterword
The Strongest Go AI: A New Book on Alpha Go: How it Works from the Perspective of Deep Learning, Monte Carlo Trees, and Reinforcement Learning

This book by Tomofumi Ohtsuki, compiles the difficult mechanisms of Alpha Go and Alpha Go Zero provided in academic papers (Nature, Google’s website, etc.) and explains the mechanisms of deep learning and reinforcement learning used in Alpha Go and Alpha Go Zero in an easy-to-understand manner, while looking at actual Go screens. The book is an easy-to-understand explanation of the mechanisms of deep learning and reinforcement learning used in Alpha Go and Alpha Go Zero, while looking at actual Go screens. In particular, dual networks are a completely new deep learning method that has attracted the interest of engineers in Japan and abroad.
By reading this book, you can learn about the mechanisms of deep learning and reinforcement learning in the latest AI and use it as a reference for your own research and development. You can also experience actual Go AI based on DeltaGo developed by the author.

The contents are shown below.

Alpha Go: A New Look at Deep Learning, Monte Carlo Trees, and Reinforcement Learning
	Introduction
	Chapter 1 The Emergence of Alpha Go
		01 History and Evolution of Game AI
		02 The Emergence of the Genius Demis Hassabis
		03 The Success of Alpha Go
		04 The Basics of Go AI
		05 Summary
	Chapter 2 Deep Learning: Go is Instantly Inspired
		01 What is Deep Learning?
		02 Example of Handwritten Number Recognition
		03 Convolutional Neural Networks in Alpha Go
		04 Training a CNN with Chainer
		05 Conclusion
	Chapter 3 Reinforcement Learning Go Learns from Experience
		01 What is Reinforcement Learning?
		02 History of Reinforcement Learning
		03 The Multi-armed Bandit Problem
		04 Reinforcement Learning for Solving Mazes
		05 Reinforcement Learning for Acquiring Video Game Controls
		06 Reinforcement Learning in Alpha Go
		07 Summary and Issues
	Chapter 4 Exploration How the Go AI Can Look Ahead
		01 Two Player Zero Sum Finite Definite Perfect Information Game
		02 Search in Games
		03 Traditional Game Tree Search (Minimax Tree Search)
		04 Monte Carlo Tree Search in Go
		05 Success Factors and Challenges of Monte Carlo Tree Search
		06 Conclusion
	Chapter 5 Completing Alpha Go
		01 The Alpha Go Blueprint
		02 Asynchronous Policy Value Update Monte Carlo Tree Search (APV-MCTS)
		03 Using Large Numbers of CPUs and GPUs
		04 Effects of APV-MCTS
		05 Weaknesses of AlphaGo
		06 The Future of AlphaGo
	Appendix 1 About the Formulas
		01 Derivation of Learning Rule for Convolutional Neural Networks
		02 Derivation of Learning Rule for Reinforcement Learning
	Appendix 2 How to use "GoGui" UI software for Go programs and "DeltaGo" program for GoGui
		01 What is DelyaGo?
		02 How to install GoGui and use "Shidorika Chikira" program for GoGui
	End

コメント

タイトルとURLをコピーしました