Mathematical decision making techniques used in reinforcement learning, online prediction, automated stock trading, etc.

Machine Learning Artificial Intelligence Digital Transformation Mathematics Probabilistic Generative Models Statistical Causal Search Reinforcement learning Bandit problem Navigation of this blog

Theory of Decision Making

Decision theory is an academic discipline that studies the process by which people gather information, evaluate alternatives, and make optimal choices. Decisions are made in a variety of domains, including everyday life, business, and politics, and decision theory provides a scientific analysis of this process and a theoretical framework.

One of the central concepts of decision theory becomes the concept of utility. Utility is a quantification of people’s preference or satisfaction with an option, and it is a criterion by which people compare different alternatives and make the optimal choice. Decision theory studies ways to model people’s utility functions and derive optimal choices.

Decision theory also treats the consideration of risk and uncertainty as important factors. Uncertainty and risk exist when people make decisions, and it is sometimes difficult to predict future outcomes. Decision theory studies how such uncertainty and risk are taken into account to make optimal choices. For example, expected utility theory is one way to quantify risk and make decisions.

In addition, decision theory may take into account people’s cognitive processes and psychological tendencies. Because people bring biases and constraints into their information processing, decisions are not always made rationally. For example, a tendency called confirmation bias indicates people’s tendency to reinforce existing beliefs and ignore new information. Decision theory takes these psychological factors into account when analyzing actual decision making.

Decision-making algorithms

Based on such decision theory, there are various “decision algorithms” that are procedures for making rational choices for specific decision problems. They include the following

Optimization algorithms: Optimization algorithms are algorithms for finding the optimal solution under given objectives and constraints, such as linear programming and integer programming. Such optimization algorithms can be used to make the best choice among multiple alternatives.
Decision Tree Algorithm: A decision tree algorithm is an algorithm for making the optimal choice among multiple alternatives using a tree structure. Specifically, there are decision trees, random forests, and gradient boosting. For example, the decision tree algorithm can be used to make complex decisions hierarchically and obtain the results in an easily interpretable form.
Bayesian Decision Theory: Bayesian decision theory is an algorithm that uses probability theory to make optimal choices while accounting for uncertainty. The Bayes Theorem can be applied to derive the optimal decision while incorporating prior information and observational data.
Algorithm for constraint satisfaction problem: The algorithm for constraint satisfaction problem is an algorithm for finding a solution under given constraints. For example, algorithms for constraint satisfaction problems widely used in the field of artificial intelligence include the backtracking method and constraint programming, which can be used to obtain the optimal combination of solutions.
Heuristics Algorithms: Heuristics algorithms will be algorithms based on empirical rules or methodologies for solving a problem.

The Art of Mathematical Decision-Making: How to derive the “only one” correct answer with easy probability.

In this article, I would like to discuss the algorithms for solving these problems based on the book “Mathematical Decision-Making Techniques: How to Derive the ‘Only One’ Correct Answer with Easy Probability“.

First of all, let’s assume that you are going to sell something at a street stall in a park one day next week, and that there are three types of businesses (A, B, C, and D). The following table shows this.

数学的決断の技術より抜粋

Here, a strategy that is concerned with “how much profit is the minimum” will result in a business A where there is not a single item with zero profit. This is technically called the “Max-Min Criterion”. (The minimum profit (min) is made the maximum (max).

Next, the expected value of all the businesses is calculated as follows: A: 2×1/4+2×1/4+1×1/4+1×1/4=1.5, B: 3×1/4+3×1/4+1×1/4=7/4=1.75, C: 2×1/4+4×1/4=6/4=1.5, D: 1×1/4+5×1/4=6/4=1.5. D:1×1/4+5×1/4=6/4=1.5, and the expected value of B is the highest. The method for choosing this is called the “expected value criterion.

If we look at the largest possible profit, we find that it is 50,000 yen on a cloudy day in D. The method for choosing this is called the “max-max criterion” (maximizing (max) the maximum value).

The last method is the one that causes the least regret, so when you choose A, if it is a sunny day, choosing B will be 30,000 yen more expensive (2-3=-10,000 yen opportunity loss), if it is a cloudy day, choosing D will be more expensive (2-5=-30,000 yen opportunity loss), if it is a rainy day (1-0=10,000 yen opportunity loss (10,000 yen profit ) on a rainy day, (1-0=10,000 yen opportunity loss (10,000 yen profit)) on a snowy day, and (1-1=0,000 yen) on a snowy day, resulting in a maximum opportunity loss of 30,000 yen. Similarly, choosing B would result in 20,000 yen, choosing C would result in 20,000 yen, and choosing C would result in 10,000 yen, and choosing C would result in the least regret, which is called the “maximum opportunity loss/minimization criterion (garbage criterion).

In the general population, 70% of people choose the “Max-Min Criterion” among these four, 30% choose the “Expected Value Criterion,” and almost no one chooses the other two. This tells us that most people are cautious when making decisions. Those who choose the “Max-Min” criterion are those who are concerned about the worst case, while those who choose the “Expected Value” criterion are those who are concerned about the average rather than the best case, in other words, those who are concerned about the whole.

There is a bias in the group that makes the decision here, for example, in a group of journalists, there are more people who choose the “max-max” criterion. Journalists are, in a sense, “optimistic” and have a strong gambling mindset, so it can be said that they are people who focus on the “good news” rather than the “worst case scenario” that normal people worry about.

The “expected value criterion” can be roughly described as “a value determined by enumerating the possibilities, considering their probabilities, and calculating the average value using the probabilities. There are various types of probabilities that can be assumed to appear here, such as “mathematical probability,” which can strictly define the possibilities, “statistical probability,” which assigns probabilities based on what has happened in the past, and “subjective probability,” which is not supported by objective data or experiments and depends on people’s assumptions. These probabilities become less rigorous as we move toward the latter, and by the time we reach the last subjective probability, it becomes merely an “image of possibility.

This subjective probability has a long history, and was justified in the mid-20th century by a statistician named Savage, who proved that “if people’s behavioral choices satisfy one set of rules, then their behavior is consistent with the behavior determined by the mathematical probability.

For example, if there are two lotteries, Lottery A and Lottery B, and Lottery A and Lottery B have the same prize money when State “rain” and State “snow” occur, and Lottery A has a prize of 100,000 yen when State “sunny” and Lottery B has a prize of 50,000 yen, and vice versa when State “cloudy” occurs, then Let’s say that you favored lottery A out of these lotteries.

In this case, the prize money for “sunny” and “cloudy” is fixed as it is, and the prize money for “rainy” and “snowy” is the same for both A and B, and new lotteries A’ and B’ are created. This new lottery does not reverse the original pattern, but favors A’ over B’. This is the “sure-thing principle” given by Savage.

This probability theory given by Savage is called “Bayesian theory,” and it has given new life to probability theory and statistics. There are two ways to use this subjective probability. One is to assign a subjective probability to an actual event and take action based on it, and the other is to find out what kind of subjective probability is assigned by a person who has an interest in you, and use that to determine your own actions.

Since these subjective probabilities are “subjective” in the first place, detailed numbers are not very meaningful. What is important is the relationship between “large and small,” that is, “more likely to happen than a certain event,” and “equality,” that is, “as likely to happen as a certain event. For example, in horse racing, if you believe that “the chance of horse A finishing in the top three is greater than the chance of horse A not finishing in the top three,” you would assign a number greater than 0.5 to “the chance of horse A finishing in the top three.

In this case of using subjective probability to predict an event for which no data is available, after conducting a “comparative study of the likelihood of occurrence” of various conditions based on one’s own experience and logic, it is sufficient to assign a large number that satisfies the large-small relationship and the equal-small relationship.

There are also methods to make further inferences assuming that these relationships and equality are uncertain. One is “multiple prior,” proposed by David Schmeidler and Itzhak Gilboa in the 1980s (a probability theory that assumes multiple possibilities (probabilities), rather than a single definite size or equality relation), and another is “unexpected prior,” proposed by Frank Knight in the 1920s. The other is a method based on “surprises,” which was proposed by Frank Knight in the 1920s to explain how unexpected events can move people.

These methods have been incorporated into modern high-speed stock trading algorithms and are being put to practical use. Translated with www.DeepL.com/Translator (free version)

And if we look at this strategy in terms of machine learning, we can think of it within the framework of reinforcement learning and online prediction. In fact, as I mentioned before, reinforcement learning is based on decision theory, which calculates the value of combining the concept of “reward” with the level of the data itself, the “expectation criterion” (a value determined by enumerating possibilities, considering their probabilities, and calculating the average value using those probabilities). When we think about these rewards, we think about strategies based on game theory (when we think about a game between two people, we don’t know for sure what the other person is going to do, so we stop trying to determine that and think about strategies based on how much profit we are guaranteed to make if we choose a certain action (guaranteed level)). ) based “Savage Criteria” will be used to calculate the reward.

Decision Theory and Mathematical Decision Making Techniques

Theory of Decision Making

Decision-making algorithms

The Art of Mathematical Decision-Making: How to derive the “only one” correct answer with easy probability.

コメント