Introduction
AlphaGo, a computer Go program developed by Google DeepMind, became the first computer Go program to defeat a human professional Go player in an even match (no handicap) in October 2015. The victory of an artificial intelligence in Go, a field that had been considered the most difficult for computers to beat humans, shocked the world, and the appearance of AlphaGo went beyond a simple victory or defeat in a single game to widely publicize the usefulness of artificial intelligence, triggering a global AI boom. It also triggered a global AI boom. In this issue, we will discuss the relationship between board games, including Go, and AI. We will also include reading notes for the book “Why Alpha Go Beat Humans” and “The Strongest Go AI: Alpha Go Kaitai Shinsho: Its Mechanism from the Perspective of Deep Learning, Monte Carlo Trees, and Reinforcement Learning“.
Board Games
Board games are a type of game played with a special game board and game pieces (pieces, cards, dice, etc.) These games can be strategic or recreational activities in which multiple players or teams compete according to rules. The relationship between games and artificial intelligence (AI) is very deep, and games have long played an important role in the research, development, and application of AI. Board games are very diverse and exist based on different themes and rules, as shown below.
1. strategy games:
Games in which strategic thinking is important, such as chess, Go, and Shogi. The objective is to defeat the opponent by skillfully placing the pieces. 2.
2. family games:
Games such as Settlers of Catan, Carcassonne, Ticket to Ride, etc., suitable for family and friends. The rules are relatively simple and combine elements of strategy and luck.
3. worker placement games:
Games in which players place pieces called “workers,” such as Caverna and Agricola, to collect resources and achieve goals. Resource management and strategy are important.
4. card games:
Games played with cards, such as playing cards, poker, bridge, and Hearthstone. Trick-taking games, card games, card battle games, etc. are examples. 5. word games: games in which the player plays with cards, such as poker, bridge, hearthstone, etc.
5. word games:
Games that involve creating or guessing words, such as Scrabble, Boggle, and Codenames. Vocabulary and word combinations are tested.
6. abstract games:
games without a theme and based on abstract rules, such as Othello, High Throw, Abstract Tactics, etc. Tactical thinking and pattern recognition are important.
7. cooperative games:
Games in which players cooperate to achieve a common goal, such as Pandemic, Globe Trotters, and Gloomhaven. Cooperation, not competition, is required.
8. horror games:
Games that incorporate elements of horror, such as Bettertown, Mansion of Madness, etc. Players take on the role of confronting a horror scenario.
9. economic games:
Games that emphasize economic elements and resource management, such as Monopoly, Acquire, and Power Grid. Players increase their assets and defeat their competitors.
The “Board Game World” section offers a wide variety of board games, including these and many more.
Board Games and AI
The relationship between games and AI is complementary and has contributed to the development of both fields. Games have become a very important element in computer science and artificial intelligence research, and the evolution of AI has brought new possibilities to game technology.
Below we discuss some important points about the relationship between games and AI.
Applications of AI in games:
As described in “History of Digital Game AI (1) (Intelligent Human-Machine Interaction)” and other articles, AI is used to control the behavior of characters and enemy characters in games, enabling AI characters as players and opponents to behave in a more realistic manner. This makes it possible for AI characters to behave more realistically as players and opponents.
AI Training and Reinforcement Learning:
Gaming environments are widely used as platforms for training and evaluating AI models. In particular, as described in “Theories and Algorithms of Various Reinforcement Learning Techniques and Their Python Implementations” Reinforcement Learning allows AI agents to learn how to maximize rewards in games, with applications ranging from self-driving car control to robotics, for example. This has a wide range of applications, from self-driving car control to robotics, for example.
Strategic Algorithms in Practice:
Various strategic algorithms, such as those described in “Overview of Game Theory and Examples of Integration and Implementation with AI Technology” and “Decision Theory and Mathematical Decision Techniques,” are often evaluated using games. For example, AI has been able to compete with very high level players in strategic board games such as chess and Go, and there are examples of AI winning against world-class top players such as Deep Blue (chess) and AlphaGo (Go).
Game Engines and AI Development:
Game engines, such as those described in “Fundamentals of Digital Game AI (Spatial Axis Awareness Technology)” are used in game development not only for graphics and physics simulation, but also for AI implementation. The engines control the behavior and reactions of AI agents, enhancing the realism of the game.
Entertainment and Education:
In addition to enhancing the entertainment experience through games, AI is also being used in education, as discussed in “AI and Education. Game-based learning environments can help students and players develop problem-solving skills and strategic thinking.
AI Advances and Game Evolution:
Advances in AI are also impacting the evolution of games. More realistic characters, interactions, and storytelling are becoming possible, improving the game experience.
Serving as a test bed:
Games are also being used as testbeds for AI algorithms and models, allowing AI researchers to test and improve their approaches to various problems in games. Board games, as an example, offer many challenges for AI to address, such as strategic decision making, aspect evaluation, and pattern recognition.
Algorithms used in board game AI
Algorithms used in board game AI vary according to the type and difficulty of the game, but there are several common algorithms. These are mainly search algorithms, as described in “Overview of Search Algorithms and Various Algorithms and Implementations. The main board game AI algorithms are described below.
Minimax Search:
Minimax search is an algorithm that searches the game tree (a tree structure that contains all possible game phases) to find the optimal move for a player. Minimax will select the most advantageous move in the opponent’s turn and the least disadvantageous move against it, and will combine this with alpha-beta pruning to make the search more efficient. See also “Online Stochastic Optimization, AdaGrad, and Minimax Optimization for Machine Learning” and “Decision Theory and Mathematical Decision Techniques” for more information.
alpha-beta pruning:.
Alpha-beta pruning is a technique to improve the efficiency of minimax search by eliminating unnecessary search. The algorithm limits the range of best moves and does not evaluate other moves. For more information, see “Alpha Beta Pruning Overview and Algorithm and Implementations.
Monte Carlo Tree Search (MCTS).
Monte Carlo Tree Search is an algorithm that performs a probabilistic search to find the best move by performing a number of playouts (playing random moves and evaluating the outcome of the game) MCTS is also used in complex games such as Go and Shogi. For details, see “Applications of the Bandit Method (1) Monte Carlo Tree Search.
Reinforcement Learning:
Reinforcement learning is an algorithm that learns from the experience of game play to obtain the optimal strategy. AI using deep reinforcement learning (Deep Reinforcement Learning) can learn advanced strategies and beat professional players, such as AlphaGo and AlphaZero. For details, see “Overview of Reinforcement Learning Techniques and Various Implementations.
evaluation function:
Board game AI uses an evaluation function for phase evaluation. The evaluation function evaluates whether the current phase is favorable or unfavorable and is used in minimax search and MCTS. The quality of the evaluation function has a significant impact on the strength of the AI.
opening book:
The opening book is a database that contains the AI’s prior knowledge of a particular opening procedure or early phase. This leads the AI to favorable aspects from the beginning of the game.
The board game AI combines these algorithms and selects the best approach according to the rules and tactics of the game. Depending on the complexity of the game and the size of the search space, different algorithms may be used, and AI enhancements and learning allow for the development of more advanced tactics and strategies.
Next, we discuss the differences in AI in the oldest and most popular games, Othello, Shogi, Go, and other strategy-based games.
Differences between Othello, Shogi, and Go from an AI perspective
There are several differences between Othello, Shogi, and Go from an AI perspective. The following is a description of AI efforts and differences in each game. 1.
1. Othello (Reversi):
Othello is a game with relatively simple rules, played on an 8×8 board, where the AI evaluates possible moves and learns strategies to optimize the placement of stones on the board.
Othello AI uses classical game tree search algorithms such as the minimax method and alpha-beta pruning. It also uses a heuristic evaluation function to evaluate the board.
AI reinforcement learning has also been applied to Othello, where deep reinforcement learning models (e.g. AlphaZero) have successfully played Othello.
2. Shogi:
Shogi is a very complex game, with more than 40 pieces on the board and a wide variety of movement rules. This makes the development of a Shogi AI very difficult.
Shogi AI uses game tree search, alpha-beta pruning, heuristic evaluation functions, etc. The evaluation of a game phase involves many factors (piece placement, offensive and defensive factors, king safety, etc.).
There are many open source projects for Shogi AI, and some AIs are strong enough to beat professional players.
3. go:
Go is a much more complex game than Shogi, where the objective is to place stones on the board. The Go board is large (19×19) and the variations of the game are enormous.
Deep learning has played a major role in the development of the Go AI; AlphaGo and AlphaZero represent the evolution of the Go AI.
Go AI uses convolutional neural networks (CNNs) and reinforcement learning (especially Monte Carlo tree search) to evaluate the game phase and select the best moves. The Go AI is strong enough to beat top human professionals.
As Othello, Shogi, and Go progress, the number of strategy combinations becomes vast. The basis of a strategic game AI will be a search algorithm, but while the Othello AI uses classical game tree search algorithms such as the minimax method and alpha-beta pruning, and uses a heuristic evaluation function to evaluate the game, the Shogi AI focuses on game tree search and heuristic evaluation focus, while Go AI, due to its many variations, is predominantly a deep learning approach.
Why did Alpha Go beat humans? Reading notes
This book is a commentary by Yasumi Saito, Professor of Information Industry Organization, Kyoto University. He specializes in population intelligence, cognitive science, the Internet and security, and has co-translated “Metamagic Games” (D.R. Hofstadter, co-translator, Hakuyosha, 2005) and “Literary Machines: Theory of Hypertext,” and is the author of “Ubiquitous Office Technology” (Denki Tsushin Kyokai, 2005). Telecommunications Association, 2005).
His reading notes are as follows.
Why Alpha Go Beat Humans Yasumi Saito
Prologue: The Impact of Alpha Go
Events in early March
Reaction of the World
Reaction of Professional Go Players
Reaction of Go Programmers
My stance (what I write in this book)
Chapter 1 AI and Games
1.1 What Has AI Aimed to Achieve?
Overview
The term AI was coined at the Dartmouth Conference of 1956.
In this conference, there is a paper titled "Synthesis of Automata Design for an intelligence amplifier
IA (Inteligence Amplifier): something that augments intelligence
Computers of the past were also IA in the sense that they performed calculations on behalf of others.
The Turing model of the principle of computers is also an abstraction of computation.
AI is an attempt to make computers behave intelligently like humans
If AI can be realized, it will become IA.
Population intelligence and computers that help people
Eliza
Simulates psychiatrist's conversation with patient
SHRDLU (SHRDLU)
Questions and actions in the world of building blocks
Dendral (DENDRAL) and Mycin (MYCIN)
Early expert systems
AM(Automated Mathematician)
Suck or Kusha simulates the process of forming mathematical concepts
Cyc(Cyc)
Knowledge-based databasing
It was not known at the time of winning the case how humans, the target of imitation, have performed intelligent behavior
1.2 Strong and Weak AI
Two meanings of "same
They are different when broken down in detail, but the same at the level of function (or the level of connection of components to realize the function, or the level of how to proceed with processing (algorithm))
1.3 History of AI
From the optimistic first boom to the knowledge engineering second boom
The third boom of deep learning
1.4 Game Programming
Chess pioneered game programming
The all-encompassing "game tree
How to determine "good moves"?
Why did AI research progress in Go and Shogi?
Robotics and language understanding far beyond game programming
1.5 Alpha Go is not AI?
The AI Effect
Chapter 2: Changes in Go Programming
2.1 History of Computer Go Research
The first program with a strength of 38 kyu
Recognition of the game by physical analogy by Albert Zobrist
Program by Algol
Program by Reitman and Wilcox (1970s)
Reasoning based on pattern recognition
Data structure called WEB represents hierarchical patterns: strings and groups of stones of the same color
Strategies are determined by the patterns.
MFG, an example of a classic Go program
DavidFotland's MFG (Many Faces of Go) in the 1980s
Combination of alpha-beta search, rule-based expert systems, and pattern matching
Go program using neural nets
1990s, Herbert Enderton's Golem
Generates candidate moves using simple rules based on recognition of game situation
Use neural nets to narrow down candidate moves
How to train neural nets
One input is given, the output is computed, and the error against the teacher signal is calculated.
After that, considering that the error is caused by the weight of the combination with the inhibition of the previous layer, modify the weights so that the error obtained becomes smaller.
In the same way, propagate the weights to the previous layer one by one.
Another program using neural nets
TD learning method 'Temporal Difference Learning' by Nicol Scharudoph et al.)
Reinforcement Learning
Go with physics?
Bernd Brugmamn's Gobble
Primitive Monte Carlo Go
Using 'Annealing Method' with thousands of random games
2.2 Using Sanmoku-tsuname as an example
Rules for Samoku-nai
Rules for Tic-Tac-Logic moves (example)
2.3 Rule-based Programs
Intelligence can be achieved by manipulating symbols
Simon and Newell's "Physical Symbol System Hypothesis
2.4 Make the system stronger by making it learn patterns
Pattern Matching
Pattern matching to identify candidate points for "Katatsugi
2.5 Program by tree search
Size of the game tree
Checker game already "solved
Evaluation of a phase that is desperately needed
Minimax method to find the best move
Alpha-beta branch pruning with increased efficiency
Tree search is not suitable for later
Chapter 3: New Trends in Go Programming
3.1 Monte Carlo Go without Go knowledge
Playing Go with Optimization Techniques
Brueckmann's Approach to Monte Carlo Go
Applying physics methods from the perspective of "how would nature play Go?
Annealing, an optimization technique, is also used to find the shortest path in a graph.
Random transitions from one state to another nearby state
Initially, the state transitions toward a larger value with a certain probability (with time, the probability becomes smaller)
Monte Carlo Go's epochal nature
Previous Go programs do not search for trees
Rules determine next move
Uses very little Go knowledge
Play completely randomly to the end, and play the move with the highest average of many such plays
3.2 Monte Carlo tree search with improved efficiency
Phase evaluation by averaging the results of random plays
Pusey, Helmstek, and Kaznav's approach
How to reduce playout (number of random plays to the end)
Trimming branches of playout
Annealing is not always necessary
Fusion of Monte Carlo Go with traditional tree search and knowledge base
Monte Carlo tree search
Paper by Coulom (Remi Coulom) in 2006
Monte Carlo tree search is a clever combination of Monte Carlo and game tree search
3.3 Emergence of Deep Learning (Deep Learning)
How Neural Networks Work
What is Deep Learning?
Autoencoding (autoencorder)
It is now written that it is seldom used in practical applications. Autoencoder has only a purpose of use in image denoising and visualization.
In fact, deep learning algorithms have since been improved to the point where random values can be employed without prior learning and still provide sufficient accuracy
Autoencoder research has progressed, and it has begun to be used in generative models.
Variational Autoencoder is probabilistic, unlike AutoEncoder, which takes an input and deterministically determines an output.
Hinton's Idea
Improved learning efficiency
Deep Convolutional Neural Networks
Convolutional.
An operation that outputs a numerical value that summarizes a partial region of the original data to the next layer while shifting the region little by little.
Convolution makes the original image more resistant to misalignment.
Chapter 4. AlphaGo Structure
4.1 Structure of AlphaGo
Policy network and value network
Two networks (models) are learned in advance
Policy network
Where the next move is played most often
Promising candidates
Value network
What is the evaluation value of the game
Final win rate
rollout policy (network)
Value network to improve accuracy
Extract promising candidate moves in the policy network
Select high value candidates and start playout
The value network reduces the calculation
4.2 Two Neural Networks and Their Learning
How much has AlphaGo "learned"?
160,000 games, 24,900,000 games used as training data
Prepared 28 million games of training data with each game and next move as a pair
1 million games are left for testing.
Use machine learning software called "distbelief" for training
Become Tensorflow
Policy network obtained by supervised learning
SL (supervised learning) policy network
Competes against previous generation policy networks
RL (Reinforcement Learning) policy network
1.28 million games per day on 50 GPUs
Randomly extract a large number of game records obtained by playing against SL policy network data
Structure of each network
4.3 Machine power is also important
Learning speed is 500 times faster than CPU
4.4 Characteristics of the Alpha Go program
Alpha Go has no construction strategy or goals
It uses the value UCB to perform a slightly broader search called "exploration
What is going on inside is just number crunching
Alpha Go does not "think".
4.5 Differences between Go players and the Alpha Go program
AlphaGo "just figures out the next move" which is promising
How Alpha Go thinks
How Alpha Go learns
Flexibility, versatility
Need to relearn when board size changes
4.6 Why did Alpha Go beat Lee Sedol 9-dan?
Three Causes of Victory
Combination of Deep Convolutional Neural Networks and Monte Carlo Tree Search
Deep Convolutional Neural Networks have a weakness: they sometimes lose the global picture of the board
Unparalleled Machine Power
Computing power of 3,000 CPUs
Goodness of the system of two neural nets
Increasing the number of intermediate layers up to 12
Monte Carlo tree search to supplement neural nets
Computer power equivalent to 3000 CPUs
Can play very strong Go just by predicting the next move Translated with www.DeepL.com/Translator (free version)
Chapter 5 New AI Possibilities
5.1 Limitations of Human Programming
Information Processing in the Human Brain Still Unknown
We do not know the details of how the human brain processes information.
Machines can only do what they are told to do
Difficulty of Programming
Programming means "programming" in advance
In order to "program" in advance, it is necessary to predict what will happen in the future.
Neural nets are a method of skipping a part of programming
5.2 Constraints on Learning Programs
Clarification of the Problem
Requirements to which machine learning can be applied
Input and output definitions must be clear and unambiguous
The act of "understanding" is difficult to define
The end product of the process of understanding (output, output, state change in the brain) cannot be defined
Difficulty in collecting training data
Difficulty in collecting a large amount of training data
Challenges in Language Processing with Neural Networks
There are some methods at the word level, but the problem is how to code sentences and put them into the neural network.
5.3 Hybrid AI of Classical Programs and Neural Networks
New Possibilities
AI started out with the idea of symbolism (Simon and Newell's physical symbolic system hypothesis)
Trying to realize it by devising symbolic processing programs on computers
Vinograd created Schlöderloo to perform actions in a world of building blocks, but found it limited
Shifted to research on human-machine interaction
Create a system where humans and machines can coexist well and be efficient as a whole
IA (Inteligence Amplifier)
5.4 Symbolizers that create symbols
What is the "physicality" of AI?
Symbolizer
SGP(Symbol Grounding Problem)
Harnad
To answer the question, "How does information from the outside world that enters the sensory organs, including touch, smell, taste, hearing, and sight, get integrated into a symbol inside a usable object?
Symbolizers are also related to "corporeality".
5.6 Applications to Robotics and Language Understanding
Neural nets (deep learning) are promising areas
Robots
State-based approach of unsupervised learning of appropriate concepts and representations by a self-learner (auto encoder)
Automatic driving
Natural language understanding
University of Tokyo Robotics Project
Pat Hayes' formulation of a mechanism for understanding general human physical laws
Naive Physics Manifesto
How to incorporate a symbolizer-like processing mechanism and how to link it to language understanding
Factors in the Failure of the Fifth Generation Project
Ending up with vague results without setting up an objectively evaluable evaluation axis
5.6 Future Go Programs
No matter how strong it is, Alpha Go is not a "strong AI
Elements lacking in Alpha Go
Goal-oriented
Strategy and tactics
Awareness of overall game flow (time axis)
Models of opponent players
Awareness of the beginning, middle, and endgame
Go programs to aim for in the future
Programs that can explain their moves
Programs that learn like a child
Chapter 6: Impact of AI on Human Society
6.1 AI in the spotlight
The Third AI Boom
6.2 Arguments that AI will surpass humans are unfounded
Robots will destroy humans
6.3 Areas where AI overcomes humans
AI and robots are good at simple tasks
Robots are not good at cooperative work with humans
6.4 Areas where AI cannot overcome humans
Creative work cannot be replaced by earth
Machines are not good at interpersonal services
Language understanding is the final bottleneck
Beyond Pessimism
Afterword
The Strongest Go AI: A New Book on Alpha Go: How it Works from the Perspective of Deep Learning, Monte Carlo Trees, and Reinforcement Learning
This book by Tomofumi Ohtsuki, compiles the difficult mechanisms of Alpha Go and Alpha Go Zero provided in academic papers (Nature, Google’s website, etc.) and explains the mechanisms of deep learning and reinforcement learning used in Alpha Go and Alpha Go Zero in an easy-to-understand manner, while looking at actual Go screens. The book is an easy-to-understand explanation of the mechanisms of deep learning and reinforcement learning used in Alpha Go and Alpha Go Zero, while looking at actual Go screens. In particular, dual networks are a completely new deep learning method that has attracted the interest of engineers in Japan and abroad.
By reading this book, you can learn about the mechanisms of deep learning and reinforcement learning in the latest AI and use it as a reference for your own research and development. You can also experience actual Go AI based on DeltaGo developed by the author.
The contents are shown below.
Alpha Go: A New Look at Deep Learning, Monte Carlo Trees, and Reinforcement Learning
Introduction
Chapter 1 The Emergence of Alpha Go
01 History and Evolution of Game AI
02 The Emergence of the Genius Demis Hassabis
03 The Success of Alpha Go
04 The Basics of Go AI
05 Summary
Chapter 2 Deep Learning: Go is Instantly Inspired
01 What is Deep Learning?
02 Example of Handwritten Number Recognition
03 Convolutional Neural Networks in Alpha Go
04 Training a CNN with Chainer
05 Conclusion
Chapter 3 Reinforcement Learning Go Learns from Experience
01 What is Reinforcement Learning?
02 History of Reinforcement Learning
03 The Multi-armed Bandit Problem
04 Reinforcement Learning for Solving Mazes
05 Reinforcement Learning for Acquiring Video Game Controls
06 Reinforcement Learning in Alpha Go
07 Summary and Issues
Chapter 4 Exploration How the Go AI Can Look Ahead
01 Two Player Zero Sum Finite Definite Perfect Information Game
02 Search in Games
03 Traditional Game Tree Search (Minimax Tree Search)
04 Monte Carlo Tree Search in Go
05 Success Factors and Challenges of Monte Carlo Tree Search
06 Conclusion
Chapter 5 Completing Alpha Go
01 The Alpha Go Blueprint
02 Asynchronous Policy Value Update Monte Carlo Tree Search (APV-MCTS)
03 Using Large Numbers of CPUs and GPUs
04 Effects of APV-MCTS
05 Weaknesses of AlphaGo
06 The Future of AlphaGo
Appendix 1 About the Formulas
01 Derivation of Learning Rule for Convolutional Neural Networks
02 Derivation of Learning Rule for Reinforcement Learning
Appendix 2 How to use "GoGui" UI software for Go programs and "DeltaGo" program for GoGui
01 What is DelyaGo?
02 How to install GoGui and use "Shidorika Chikira" program for GoGui
End
コメント