Introduction
AlphaGo, a computer Go program developed by Google DeepMind, became the first computer Go program to defeat a human professional Go player in an even match (no handicap) in October 2015. The victory of an artificial intelligence in Go, a field that had been considered the most difficult for computers to beat humans, shocked the world, and the appearance of AlphaGo went beyond a simple victory or defeat in a single game to widely publicize the usefulness of artificial intelligence, triggering a global AI boom. It also triggered a global AI boom. In this issue, we will discuss the relationship between board games, including Go, and AI. We will also include reading notes for the book “Why Alpha Go Beat Humans” and “The Strongest Go AI: Alpha Go Kaitai Shinsho: Its Mechanism from the Perspective of Deep Learning, Monte Carlo Trees, and Reinforcement Learning“.
Board Games
Board games are a type of game played with a special game board and game pieces (pieces, cards, dice, etc.) These games can be strategic or recreational activities in which multiple players or teams compete according to rules. The relationship between games and artificial intelligence (AI) is very deep, and games have long played an important role in the research, development, and application of AI. Board games are very diverse and exist based on different themes and rules, as shown below.
1. strategy games:
Games in which strategic thinking is important, such as chess, Go, and Shogi. The objective is to defeat the opponent by skillfully placing the pieces. 2.
2. family games:
Games such as Settlers of Catan, Carcassonne, Ticket to Ride, etc., suitable for family and friends. The rules are relatively simple and combine elements of strategy and luck.
3. worker placement games:
Games in which players place pieces called “workers,” such as Caverna and Agricola, to collect resources and achieve goals. Resource management and strategy are important.
4. card games:
Games played with cards, such as playing cards, poker, bridge, and Hearthstone. Trick-taking games, card games, card battle games, etc. are examples. 5. word games: games in which the player plays with cards, such as poker, bridge, hearthstone, etc.
5. word games:
Games that involve creating or guessing words, such as Scrabble, Boggle, and Codenames. Vocabulary and word combinations are tested.
6. abstract games:
games without a theme and based on abstract rules, such as Othello, High Throw, Abstract Tactics, etc. Tactical thinking and pattern recognition are important.
7. cooperative games:
Games in which players cooperate to achieve a common goal, such as Pandemic, Globe Trotters, and Gloomhaven. Cooperation, not competition, is required.
8. horror games:
Games that incorporate elements of horror, such as Bettertown, Mansion of Madness, etc. Players take on the role of confronting a horror scenario.
9. economic games:
Games that emphasize economic elements and resource management, such as Monopoly, Acquire, and Power Grid. Players increase their assets and defeat their competitors.
The “Board Game World” section offers a wide variety of board games, including these and many more.
Board Games and AI
The relationship between games and AI is complementary and has contributed to the development of both fields. Games have become a very important element in computer science and artificial intelligence research, and the evolution of AI has brought new possibilities to game technology.
Below we discuss some important points about the relationship between games and AI.
Applications of AI in games:
As described in “History of Digital Game AI (1) (Intelligent Human-Machine Interaction)” and other articles, AI is used to control the behavior of characters and enemy characters in games, enabling AI characters as players and opponents to behave in a more realistic manner. This makes it possible for AI characters to behave more realistically as players and opponents.
AI Training and Reinforcement Learning:
Gaming environments are widely used as platforms for training and evaluating AI models. In particular, as described in “Theories and Algorithms of Various Reinforcement Learning Techniques and Their Python Implementations” Reinforcement Learning allows AI agents to learn how to maximize rewards in games, with applications ranging from self-driving car control to robotics, for example. This has a wide range of applications, from self-driving car control to robotics, for example.
Strategic Algorithms in Practice:
Various strategic algorithms, such as those described in “Overview of Game Theory and Examples of Integration and Implementation with AI Technology” and “Decision Theory and Mathematical Decision Techniques,” are often evaluated using games. For example, AI has been able to compete with very high level players in strategic board games such as chess and Go, and there are examples of AI winning against world-class top players such as Deep Blue (chess) and AlphaGo (Go).
Game Engines and AI Development:
Game engines, such as those described in “Fundamentals of Digital Game AI (Spatial Axis Awareness Technology)” are used in game development not only for graphics and physics simulation, but also for AI implementation. The engines control the behavior and reactions of AI agents, enhancing the realism of the game.
Entertainment and Education:
In addition to enhancing the entertainment experience through games, AI is also being used in education, as discussed in “AI and Education. Game-based learning environments can help students and players develop problem-solving skills and strategic thinking.
AI Advances and Game Evolution:
Advances in AI are also impacting the evolution of games. More realistic characters, interactions, and storytelling are becoming possible, improving the game experience.
Serving as a test bed:
Games are also being used as testbeds for AI algorithms and models, allowing AI researchers to test and improve their approaches to various problems in games. Board games, as an example, offer many challenges for AI to address, such as strategic decision making, aspect evaluation, and pattern recognition.
Algorithms used in board game AI
Algorithms used in board game AI vary according to the type and difficulty of the game, but there are several common algorithms. These are mainly search algorithms, as described in “Overview of Search Algorithms and Various Algorithms and Implementations. The main board game AI algorithms are described below.
Minimax Search:
Minimax search is an algorithm that searches the game tree (a tree structure that contains all possible game phases) to find the optimal move for a player. Minimax will select the most advantageous move in the opponent’s turn and the least disadvantageous move against it, and will combine this with alpha-beta pruning to make the search more efficient. See also “Online Stochastic Optimization, AdaGrad, and Minimax Optimization for Machine Learning” and “Decision Theory and Mathematical Decision Techniques” for more information.
alpha-beta pruning:.
Alpha-beta pruning is a technique to improve the efficiency of minimax search by eliminating unnecessary search. The algorithm limits the range of best moves and does not evaluate other moves. For more information, see “Alpha Beta Pruning Overview and Algorithm and Implementations.
Monte Carlo Tree Search (MCTS).
Monte Carlo Tree Search is an algorithm that performs a probabilistic search to find the best move by performing a number of playouts (playing random moves and evaluating the outcome of the game) MCTS is also used in complex games such as Go and Shogi. For details, see in “Overview of Monte Carlo Tree Search and Examples of Algorithms and Implementations“
Reinforcement Learning:
Reinforcement learning is an algorithm that learns from the experience of game play to obtain the optimal strategy. AI using deep reinforcement learning (Deep Reinforcement Learning) can learn advanced strategies and beat professional players, such as AlphaGo and AlphaZero. For details, see “Overview of Reinforcement Learning Techniques and Various Implementations.
evaluation function:
Board game AI uses an evaluation function for phase evaluation. The evaluation function evaluates whether the current phase is favorable or unfavorable and is used in minimax search and MCTS. The quality of the evaluation function has a significant impact on the strength of the AI.
opening book:
The opening book is a database that contains the AI’s prior knowledge of a particular opening procedure or early phase. This leads the AI to favorable aspects from the beginning of the game.
The board game AI combines these algorithms and selects the best approach according to the rules and tactics of the game. Depending on the complexity of the game and the size of the search space, different algorithms may be used, and AI enhancements and learning allow for the development of more advanced tactics and strategies.
Next, we discuss the differences in AI in the oldest and most popular games, Othello, Shogi, Go, and other strategy-based games.
Differences between Othello, Shogi, and Go from an AI perspective
There are several differences between Othello, Shogi, and Go from an AI perspective. The following is a description of AI efforts and differences in each game. 1.
1. Othello (Reversi):
Othello is a game with relatively simple rules, played on an 8×8 board, where the AI evaluates possible moves and learns strategies to optimize the placement of stones on the board.
Othello AI uses classical game tree search algorithms such as the minimax method described in “Overview of the minimax method and examples of algorithms and implementations” and alpha-beta pruning. It also uses a heuristic evaluation function to evaluate the board.
AI reinforcement learning has also been applied to Othello, where deep reinforcement learning models (e.g. AlphaZero) have successfully played Othello.
2. Shogi:
Shogi is a very complex game, with more than 40 pieces on the board and a wide variety of movement rules. This makes the development of a Shogi AI very difficult.
Shogi AI uses game tree search, alpha-beta pruning, heuristic evaluation functions, etc. The evaluation of a game phase involves many factors (piece placement, offensive and defensive factors, king safety, etc.).
There are many open source projects for Shogi AI, and some AIs are strong enough to beat professional players.
3. go:
Go is a much more complex game than Shogi, where the objective is to place stones on the board. The Go board is large (19×19) and the variations of the game are enormous.
Deep learning has played a major role in the development of the Go AI; AlphaGo and AlphaZero represent the evolution of the Go AI.
Go AI uses convolutional neural networks (CNNs) and reinforcement learning (especially Monte Carlo tree search) to evaluate the game phase and select the best moves. The Go AI is strong enough to beat top human professionals.
As Othello, Shogi, and Go progress, the number of strategy combinations becomes vast. The basis of a strategic game AI will be a search algorithm, but while the Othello AI uses classical game tree search algorithms such as the minimax method and alpha-beta pruning, and uses a heuristic evaluation function to evaluate the game, the Shogi AI focuses on game tree search and heuristic evaluation focus, while Go AI, due to its many variations, is predominantly a deep learning approach.
Why did Alpha Go beat humans? Reading notes
This book is a commentary by Yasumi Saito, Professor of Information Industry Organization, Kyoto University. He specializes in population intelligence, cognitive science, the Internet and security, and has co-translated “Metamagic Games” (D.R. Hofstadter, co-translator, Hakuyosha, 2005) and “Literary Machines: Theory of Hypertext,” and is the author of “Ubiquitous Office Technology” (Denki Tsushin Kyokai, 2005). Telecommunications Association, 2005).
His reading notes are as follows.
Why Alpha Go Beat Humans Yasumi Saito Prologue: The Impact of Alpha Go Events in early March Reaction of the World Reaction of Professional Go Players Reaction of Go Programmers My stance (what I write in this book) Chapter 1 AI and Games 1.1 What Has AI Aimed to Achieve? Overview The term AI was coined at the Dartmouth Conference of 1956. In this conference, there is a paper titled "Synthesis of Automata Design for an intelligence amplifier IA (Inteligence Amplifier): something that augments intelligence Computers of the past were also IA in the sense that they performed calculations on behalf of others. The Turing model of the principle of computers is also an abstraction of computation. AI is an attempt to make computers behave intelligently like humans If AI can be realized, it will become IA. Population intelligence and computers that help people Eliza Simulates psychiatrist's conversation with patient SHRDLU (SHRDLU) Questions and actions in the world of building blocks Dendral (DENDRAL) and Mycin (MYCIN) Early expert systems AM(Automated Mathematician) Suck or Kusha simulates the process of forming mathematical concepts Cyc(Cyc) Knowledge-based databasing It was not known at the time of winning the case how humans, the target of imitation, have performed intelligent behavior 1.2 Strong and Weak AI Two meanings of "same They are different when broken down in detail, but the same at the level of function (or the level of connection of components to realize the function, or the level of how to proceed with processing (algorithm)) 1.3 History of AI From the optimistic first boom to the knowledge engineering second boom The third boom of deep learning 1.4 Game Programming Chess pioneered game programming The all-encompassing "game tree How to determine "good moves"? Why did AI research progress in Go and Shogi? Robotics and language understanding far beyond game programming 1.5 Alpha Go is not AI? The AI Effect Chapter 2: Changes in Go Programming 2.1 History of Computer Go Research The first program with a strength of 38 kyu Recognition of the game by physical analogy by Albert Zobrist Program by Algol Program by Reitman and Wilcox (1970s) Reasoning based on pattern recognition Data structure called WEB represents hierarchical patterns: strings and groups of stones of the same color Strategies are determined by the patterns. MFG, an example of a classic Go program DavidFotland's MFG (Many Faces of Go) in the 1980s Combination of alpha-beta search, rule-based expert systems, and pattern matching Go program using neural nets 1990s, Herbert Enderton's Golem Generates candidate moves using simple rules based on recognition of game situation Use neural nets to narrow down candidate moves How to train neural nets One input is given, the output is computed, and the error against the teacher signal is calculated. After that, considering that the error is caused by the weight of the combination with the inhibition of the previous layer, modify the weights so that the error obtained becomes smaller. In the same way, propagate the weights to the previous layer one by one. Another program using neural nets TD learning method described in "Overview of TD learning and examples of algorithms and implementations" 'Temporal Difference Learning' by Nicol Scharudoph et al.) Reinforcement Learning Go with physics? Bernd Brugmamn's Gobble Primitive Monte Carlo Go Using 'Annealing Method' with thousands of random games 2.2 Using Sanmoku-tsuname as an example Rules for Samoku-nai Rules for Tic-Tac-Logic moves (example) 2.3 Rule-based Programs Intelligence can be achieved by manipulating symbols Simon and Newell's "Physical Symbol System Hypothesis 2.4 Make the system stronger by making it learn patterns Pattern Matching Pattern matching to identify candidate points for "Katatsugi 2.5 Program by tree search Size of the game tree Checker game already "solved Evaluation of a phase that is desperately needed Minimax method to find the best move Alpha-beta branch pruning with increased efficiency Tree search is not suitable for later Chapter 3: New Trends in Go Programming 3.1 Monte Carlo Go without Go knowledge Playing Go with Optimization Techniques Brueckmann's Approach to Monte Carlo Go Applying physics methods from the perspective of "how would nature play Go? Annealing, an optimization technique, is also used to find the shortest path in a graph. Random transitions from one state to another nearby state Initially, the state transitions toward a larger value with a certain probability (with time, the probability becomes smaller) Monte Carlo Go's epochal nature Previous Go programs do not search for trees Rules determine next move Uses very little Go knowledge Play completely randomly to the end, and play the move with the highest average of many such plays 3.2 Monte Carlo tree search with improved efficiency Phase evaluation by averaging the results of random plays Pusey, Helmstek, and Kaznav's approach How to reduce playout (number of random plays to the end) Trimming branches of playout Annealing is not always necessary Fusion of Monte Carlo Go with traditional tree search and knowledge base Monte Carlo tree search Paper by Coulom (Remi Coulom) in 2006 Monte Carlo tree search is a clever combination of Monte Carlo and game tree search 3.3 Emergence of Deep Learning (Deep Learning) How Neural Networks Work What is Deep Learning? Autoencoding (autoencorder) It is now written that it is seldom used in practical applications. Autoencoder has only a purpose of use in image denoising and visualization. In fact, deep learning algorithms have since been improved to the point where random values can be employed without prior learning and still provide sufficient accuracy Autoencoder research has progressed, and it has begun to be used in generative models. Variational Autoencoder is probabilistic, unlike AutoEncoder, which takes an input and deterministically determines an output. Hinton's Idea Improved learning efficiency Deep Convolutional Neural Networks Convolutional. An operation that outputs a numerical value that summarizes a partial region of the original data to the next layer while shifting the region little by little. Convolution makes the original image more resistant to misalignment. Chapter 4. AlphaGo Structure 4.1 Structure of AlphaGo Policy network and value network Two networks (models) are learned in advance Policy network Where the next move is played most often Promising candidates Value network What is the evaluation value of the game Final win rate rollout policy (network) Value network to improve accuracy Extract promising candidate moves in the policy network Select high value candidates and start playout The value network reduces the calculation 4.2 Two Neural Networks and Their Learning How much has AlphaGo "learned"? 160,000 games, 24,900,000 games used as training data Prepared 28 million games of training data with each game and next move as a pair 1 million games are left for testing. Use machine learning software called "distbelief" for training Become Tensorflow Policy network obtained by supervised learning SL (supervised learning) policy network Competes against previous generation policy networks RL (Reinforcement Learning) policy network 1.28 million games per day on 50 GPUs Randomly extract a large number of game records obtained by playing against SL policy network data Structure of each network 4.3 Machine power is also important Learning speed is 500 times faster than CPU 4.4 Characteristics of the Alpha Go program Alpha Go has no construction strategy or goals It uses the value UCB to perform a slightly broader search called "exploration What is going on inside is just number crunching Alpha Go does not "think". 4.5 Differences between Go players and the Alpha Go program AlphaGo "just figures out the next move" which is promising How Alpha Go thinks How Alpha Go learns Flexibility, versatility Need to relearn when board size changes 4.6 Why did Alpha Go beat Lee Sedol 9-dan? Three Causes of Victory Combination of Deep Convolutional Neural Networks and Monte Carlo Tree Search Deep Convolutional Neural Networks have a weakness: they sometimes lose the global picture of the board Unparalleled Machine Power Computing power of 3,000 CPUs Goodness of the system of two neural nets Increasing the number of intermediate layers up to 12 Monte Carlo tree search to supplement neural nets Computer power equivalent to 3000 CPUs Can play very strong Go just by predicting the next move Translated with www.DeepL.com/Translator (free version) Chapter 5 New AI Possibilities 5.1 Limitations of Human Programming Information Processing in the Human Brain Still Unknown We do not know the details of how the human brain processes information. Machines can only do what they are told to do Difficulty of Programming Programming means "programming" in advance In order to "program" in advance, it is necessary to predict what will happen in the future. Neural nets are a method of skipping a part of programming 5.2 Constraints on Learning Programs Clarification of the Problem Requirements to which machine learning can be applied Input and output definitions must be clear and unambiguous The act of "understanding" is difficult to define The end product of the process of understanding (output, output, state change in the brain) cannot be defined Difficulty in collecting training data Difficulty in collecting a large amount of training data Challenges in Language Processing with Neural Networks There are some methods at the word level, but the problem is how to code sentences and put them into the neural network. 5.3 Hybrid AI of Classical Programs and Neural Networks New Possibilities AI started out with the idea of symbolism (Simon and Newell's physical symbolic system hypothesis) Trying to realize it by devising symbolic processing programs on computers Vinograd created Schlöderloo to perform actions in a world of building blocks, but found it limited Shifted to research on human-machine interaction Create a system where humans and machines can coexist well and be efficient as a whole IA (Inteligence Amplifier) 5.4 Symbolizers that create symbols What is the "physicality" of AI? Symbolizer SGP(Symbol Grounding Problem) Harnad To answer the question, "How does information from the outside world that enters the sensory organs, including touch, smell, taste, hearing, and sight, get integrated into a symbol inside a usable object? Symbolizers are also related to "corporeality". 5.6 Applications to Robotics and Language Understanding Neural nets (deep learning) are promising areas Robots State-based approach of unsupervised learning of appropriate concepts and representations by a self-learner (auto encoder) Automatic driving Natural language understanding University of Tokyo Robotics Project Pat Hayes' formulation of a mechanism for understanding general human physical laws Naive Physics Manifesto How to incorporate a symbolizer-like processing mechanism and how to link it to language understanding Factors in the Failure of the Fifth Generation Project Ending up with vague results without setting up an objectively evaluable evaluation axis 5.6 Future Go Programs No matter how strong it is, Alpha Go is not a "strong AI Elements lacking in Alpha Go Goal-oriented Strategy and tactics Awareness of overall game flow (time axis) Models of opponent players Awareness of the beginning, middle, and endgame Go programs to aim for in the future Programs that can explain their moves Programs that learn like a child Chapter 6: Impact of AI on Human Society 6.1 AI in the spotlight The Third AI Boom 6.2 Arguments that AI will surpass humans are unfounded Robots will destroy humans 6.3 Areas where AI overcomes humans AI and robots are good at simple tasks Robots are not good at cooperative work with humans 6.4 Areas where AI cannot overcome humans Creative work cannot be replaced by earth Machines are not good at interpersonal services Language understanding is the final bottleneck Beyond Pessimism Afterword
The Strongest Go AI: A New Book on Alpha Go: How it Works from the Perspective of Deep Learning, Monte Carlo Trees, and Reinforcement Learning
This book by Tomofumi Ohtsuki, compiles the difficult mechanisms of Alpha Go and Alpha Go Zero provided in academic papers (Nature, Google’s website, etc.) and explains the mechanisms of deep learning and reinforcement learning used in Alpha Go and Alpha Go Zero in an easy-to-understand manner, while looking at actual Go screens. The book is an easy-to-understand explanation of the mechanisms of deep learning and reinforcement learning used in Alpha Go and Alpha Go Zero, while looking at actual Go screens. In particular, dual networks are a completely new deep learning method that has attracted the interest of engineers in Japan and abroad.
By reading this book, you can learn about the mechanisms of deep learning and reinforcement learning in the latest AI and use it as a reference for your own research and development. You can also experience actual Go AI based on DeltaGo developed by the author.
The contents are shown below.
Alpha Go: A New Look at Deep Learning, Monte Carlo Trees, and Reinforcement Learning
Introduction
Chapter 1 The Emergence of Alpha Go
01 History and Evolution of Game AI
02 The Emergence of the Genius Demis Hassabis
03 The Success of Alpha Go
04 The Basics of Go AI
05 Summary
Chapter 2 Deep Learning: Go is Instantly Inspired
01 What is Deep Learning?
02 Example of Handwritten Number Recognition
03 Convolutional Neural Networks in Alpha Go
04 Training a CNN with Chainer
05 Conclusion
Chapter 3 Reinforcement Learning Go Learns from Experience
01 What is Reinforcement Learning?
02 History of Reinforcement Learning
03 The Multi-armed Bandit Problem
04 Reinforcement Learning for Solving Mazes
05 Reinforcement Learning for Acquiring Video Game Controls
06 Reinforcement Learning in Alpha Go
07 Summary and Issues
Chapter 4 Exploration How the Go AI Can Look Ahead
01 Two Player Zero Sum Finite Definite Perfect Information Game
02 Search in Games
03 Traditional Game Tree Search (Minimax Tree Search)
04 Monte Carlo Tree Search in Go
05 Success Factors and Challenges of Monte Carlo Tree Search
06 Conclusion
Chapter 5 Completing Alpha Go
01 The Alpha Go Blueprint
02 Asynchronous Policy Value Update Monte Carlo Tree Search (APV-MCTS)
03 Using Large Numbers of CPUs and GPUs
04 Effects of APV-MCTS
05 Weaknesses of AlphaGo
06 The Future of AlphaGo
Appendix 1 About the Formulas
01 Derivation of Learning Rule for Convolutional Neural Networks
02 Derivation of Learning Rule for Reinforcement Learning
Appendix 2 How to use "GoGui" UI software for Go programs and "DeltaGo" program for GoGui
01 What is DelyaGo?
02 How to install GoGui and use "Shidorika Chikira" program for GoGui
End
コメント