Overview of Information Set Monte Carlo Tree Search (ISMCTS), its algorithm and example implementations

Mathematics Machine Learning Artificial Intelligence Graph Data Algorithm Programming Digital Transformation Algorithms and Data structures Navigation of this blog
Overview of Information Set Monte Carlo Tree Search (ISMCTS)

Information Set Monte Carlo Tree Search (ISMCTS) is a variant of Monte Carlo Tree Search (MCTS) used in games such as incomplete information games (e.g. poker) and information hiding games (e.g. Go, Shogi). It is a variant of Monte Carlo Tree Search (MCTS) used in games such as poker and games that hide information (e.g., Go and Shogi).

The following is an overview of ISMCTS.

1. information sets: If some states in a game may appear similar to players, they belong to the same information set, e.g., if a hand in a game of poker is invisible to other players, the players belong to the same information set.

2. Application to incomplete information games: In incomplete information games, players do not have complete knowledge of other players’ hands or information. 3. search and plane measures

3. Search and Plane Measures: ISMCTS uses a measure called a plane measure to perform search. Plane measures represent the probability distribution of actions taken by players in each information set, and they allow the search to proceed by taking into account incomplete information for each player.

4. Information Collection and Use: ISMCTS collects statistics (e.g., number of visits, rewards, etc.) for each information set and uses this information during exploration. Information set statistics are used to update plane measures and select actions. 5.

5. search deployment and back-propagation: ISMCTS, like regular MCTS, deploys nodes during search and performs simulations to obtain rewards. The rewards obtained are back-propagated from the top of the tree to the bottom, and the statistics are updated.

ISMCTS is widely used as an effective search method in incomplete information games and information hiding games, and by introducing the concept of information sets, the approach makes it possible to account for players’ incomplete information.

Algorithms related to Information Set Monte Carlo Tree Search (ISMCTS)

The following is an overview of the ISMCTS algorithm.

1. define information sets: group game states and define each group as an information set. An information set contains situations in which players cannot distinguish between game states. For example, in poker, each player’s hand belongs to the same information set because the other players’ hands are invisible. 2.

2. Calculation of plain policy: In each information set, a plain policy is calculated. Plain policies represent the probability of choosing each action, and usually random or heuristic policies are used.

3. Tree Search: Similar to MCTS, ISMCTS selects actions by searching a tree. Each node is associated with a set of information, and actions are selected based on plain measures.

4. simulation and reward calculation: Based on the selected actions, a simulation is performed and the state proceeds to the end. Rewards are calculated from the game results, and the rewards are computed based on a reward function that is reasonable for each player.

5. back-propagation: The tree statistics are updated by propagating the simulation results to the parent nodes. The statistics are updated for each set of information for each player, helping to improve planing strategies.

6. iterative search: Search is repeated within a certain number of iterations or time limit, and the tree is updated at each iteration to select better actions.

Application of Information Set Monte Carlo Tree Search (ISMCTS)

Information Set Monte Carlo Tree Search (ISMCTS) has been widely applied in incomplete information games and information hiding games. The following are examples of applications.

1. Poker: Poker is a typical incomplete information game where players need to choose their actions in situations where they do not fully know the other players’ hands; ISMCTS is used to learn effective player action strategies in poker and to develop powerful poker agents.

2. go: Go is an example of a game of information hiding, where the player never fully knows the placement of his opponent’s stones. ismcts has been adopted as an efficient search method in Go and is used in AlphaGo and its successor systems.

3. Shogi: Shogi is another example of a game that hides information, making it difficult to fully understand the placement of the opponent’s pieces and move strategy; ISMCTS is also used as a search method in Shogi to aid in the development of Shogi software.

4. strategy games: strategy games and real-time strategy (RTS) games involve multiple player interactions and elements of information hiding; ISMCTS has been used as an effective exploration technique in these games as well.

5. card games: card games contain an element of incomplete information because other players’ moves are not visible; ISMCTS can be a useful exploration technique in card games as well.

Example implementation of Information Set Monte Carlo Tree Search (ISMCTS)

An example implementation of ISMCTS is shown below. The following Python code is a simple implementation of ISMCTS for blackjack, a game of cards. In this implementation, the player is playing blackjack with the dealer.

import random

class Node:
    def __init__(self, state, parent=None):
        self.state = state  # Game State
        self.parent = parent  # parent node
        self.children = {}  # Child Node (Action: Node)
        self.visits = 0  # Number of visits to node
        self.score = 0  # Total node score

    def is_fully_expanded(self):
        return len(self.children) == len(self.state.get_legal_moves())

    def select_child(self, exploration_factor=0.7):
        return max(self.children.values(), key=lambda child: child.get_score(exploration_factor))

    def get_score(self, exploration_factor):
        if self.visits == 0:
            return float('inf')
        exploitation = self.score / self.visits
        exploration = exploration_factor * (self.parent.visits / (1 + self.visits))
        return exploitation + exploration

    def add_child(self, action, child_state):
        child = Node(child_state, self)
        self.children[action] = child
        return child

class BlackjackState:
    def __init__(self, player_hand, dealer_hand, usable_ace):
        self.player_hand = player_hand
        self.dealer_hand = dealer_hand
        self.usable_ace = usable_ace

    def get_legal_moves(self):
        return ['hit', 'stand']

    def clone(self):
        return BlackjackState(self.player_hand[:], self.dealer_hand[:], self.usable_ace)

    def is_terminal(self):
        return self.get_score(self.player_hand) > 21 or (len(self.player_hand) == 2 and self.get_score(self.player_hand) == 21) or self.get_score(self.dealer_hand) > 21 or (len(self.dealer_hand) == 2 and self.get_score(self.dealer_hand) == 21)

    def get_score(self, hand):
        score = sum(hand)
        if score <= 11 and 1 in hand and self.usable_ace: score += 10 return score def move(self, action): new_state = self.clone() if action == 'hit': new_state.player_hand.append(random.randint(1, 13)) if new_state.get_score(new_state.player_hand) > 21:
                return new_state, -1
            else:
                return new_state, 0
        elif action == 'stand':
            while new_state.get_score(new_state.dealer_hand) < 17: new_state.dealer_hand.append(random.randint(1, 13)) if new_state.get_score(new_state.dealer_hand) > 21 or new_state.get_score(new_state.player_hand) > new_state.get_score(new_state.dealer_hand):
                return new_state, 1
            elif new_state.get_score(new_state.player_hand) == new_state.get_score(new_state.dealer_hand):
                return new_state, 0
            else:
                return new_state, -1

def ismcts_search(root_state, num_iterations):
    root_node = Node(root_state)

    for _ in range(num_iterations):
        node = root_node
        state = root_state.clone()

        # selection
        while not state.is_terminal() and node.is_fully_expanded():
            node = node.select_child()
            if node.state.is_terminal():
                break
            state, reward = state.move(random.choice(state.get_legal_moves()))

        # development
        if not state.is_terminal():
            unexplored_move = random.choice(state.get_legal_moves())
            new_state, reward = state.move(unexplored_move)
            node = node.add_child(unexplored_move, new_state)

        # Simulation
        while not state.is_terminal():
            state, reward = state.move(random.choice(state.get_legal_moves()))

        # back-propagation
        while node is not None:
            node.visits += 1
            node.score += reward
            node = node.parent

    return max(root_node.children, key=lambda c: c.visits).state.player_hand

def main():
    initial_player_hand = [random.randint(1, 13) for _ in range(2)]
    initial_dealer_hand = [random.randint(1, 13) for _ in range(2)]
    root_state = BlackjackState(initial_player_hand, initial_dealer_hand, True)

    print("Player's initial hand:", root_state.player_hand)
    print("Dealer's initial hand:", root_state.dealer_hand)

    while not root_state.is_terminal():
        action = ismcts_search(root_state, 10000)
        root_state, reward = root_state.move(action)
        print("Player's hand after action:", root_state.player_hand)

    player_score = root_state.get_score(root_state.player_hand)
    dealer_score = root_state.get_score(root_state.dealer_hand)
    if player_score > 21:
        print("Player busts! Dealer wins.")
    elif dealer_score > 21:
        print("Dealer busts! Player wins.")
    elif player_score > dealer_score:
        print("Player wins with a score of", player_score, "vs Dealer's score of", dealer_score)
    elif player_score < dealer_score:
        print("Dealer wins with a score of", dealer_score, "vs Player's score of", player_score)
    else:
        print("It's a tie!")

if __name__ == "__main__":
    main()

This code is an example of a simple implementation of ISMCTS for a blackjack game, where ISMCTS is used as the search algorithm to select the player’s hand.

Challenges of Information Set Monte Carlo Tree Search (ISMCTS) and how to address them

Information Set Monte Carlo Tree Search (ISMCTS) has several challenges, and there are several ways to address them. These are described below.

1. complexity of information management:

Challenge: ISMCTS requires managing statistics for each information set, which is complicated when the game information is complex and there are many information sets.

Solution: Use efficient data structures and algorithms to optimize the management of information. Effective clustering of information sets can also simplify the management of statistical information.

2. inefficiency of search:

Challenge: ISMCTS requires a search for each information set, and as the number of information sets increases, the efficiency of search decreases.

Solution: To improve the efficiency of search, optimization of search algorithms, such as improvement of plane measures and introduction of heuristic methods, is required. Also, methods for restricting the search range and adjusting parameters that control the search depth will be effective.

3. convergence to a locally optimal solution:

Challenge: ISMCTS may converge to a locally optimal solution. In particular, if the search in each information set is biased toward a locally optimal solution, the overall search result may converge to a locally optimal solution.

Solution: To prevent convergence to the local optimal solution, adopt search strategies that allow for diversity and methods that encourage departure from the local optimal solution. In addition, introducing randomness can increase the diversity of the search.

4. delay in information propagation:

Challenge: ISMCTS may cause a delay until the statistics in each information set are reflected, and as the number of information sets increases, the delay in information propagation becomes more pronounced.

Solution: To reduce the delay in information propagation, it is important to use efficient data structures and algorithms. Another effective approach is to predict the delay in information propagation and apply appropriate search strategies.

Reference Information and Reference Books

For general machine learning algorithms including search algorithms, see “Algorithms and Data Structures” or “General Machine Learning and Data Analysis.

Algorithms” and other reference books are also available.

Basic documents.
1. browne, C. B., et al. (2012).
A Survey of Monte Carlo Tree Search Methods’.
– Description: comprehensive review of MCTS; provides a foundation for understanding the fundamentals of IS-MCTS.

2. cowling, P. I., Powley, E. J., & Whitehouse, D. (2012).
‘Information Set Monte Carlo Tree Search ‘.
– Description: initial IS-MCTS paper. It details the IS-MCTS approach in scenarios that are not full information games.

3. “Split Moves for Monte-Carlo Tree Search

Applications and related books.
4. Silver, D., & Huang, A. (2016).
Mastering the game of Go with deep neural networks and tree search’.
– Description: key paper related to applications of IS-MCTS and deep learning. In particular, the AlphaGo methodology is described.

5. russell, S., & Norvig, P. (2020).
Artificial Intelligence: A Modern Approach’ (4th Edition).
– Contents: basic theory of MCTS and IS-MCTS, and their relationship to reinforcement learning.

6.Introducing Monte Carlo Methods with R

7. AI for Games, Third Edition

コメント

Exit mobile version
タイトルとURLをコピーしました