Machine Learning Startup Series “Reinforcement Learning in Python”

Machine Learning Artificial Intelligence Digital Transformation Probabilistic generative model Sensor Data/IOT Online Learning Deep Learning Reinforcement Learning Technologies Navigation of this blog

Summary

Reinforcement learning is a field of machine learning in which an agent, which is the subject of learning, interacts with its environment and selects its actions, selects the best action from among the actions it can choose in an unknown environment or an environment with complex problems (environment (enviroment)). The method selects the optimal action while learning the optimal one based on the State and Reward. Typical methods for selecting them include Q learning, SARSA, DDPG, DQN, and PPO, etc. These methods are used to evaluate the value of the agent’s actions and measures, and provide algorithms for selecting the optimal action.

Reinforcement learning has been applied in various fields such as robot control, game play, natural language processing, and financial trading, and can realize advanced decisions that exceed human judgment. Here, we discuss reinforcement learning based on the Machine Learning Professional Series “Reinforcement Learning with Python“.

I will describe the reading notes in this article.

You can implement reinforcement learning!　For engineers, this book provides detailed explanations from scratch with Python sample code. The book introduces the weaknesses of reinforcement learning that are bottlenecks in practical applications and how to overcome them, as well as the areas of application. Code also available!

Day1 Understanding the Position of Reinforcement Learning

Relationship between reinforcement learning and various keywords
Advantages and disadvantages of reinforcement learning
Problem setting in reinforcement learning: Markov Decision Process

Day2 Reinforcement Learning Solution (1): Planning from Environment

Definition and Calculation of Value: Bellman Equation
Learning State Evaluation by Dynamic Programming: Value Iteration
Learning Strategies with Dynamic Programming: Policy Iteration
Difference between Model-Based and Model-Free

Day3 Reinforcement Learning Solution (2): Planning from Experience

Balancing the Accumulation and Use of Experience: Epsilon-Greedy Method
Revising plans from experience or prediction: Monte Carlo vs Temporal Difference
How to use experience to update state assessment or strategy

Day4 Applying Neural Networks to Reinforcement Learning

Applying Neural Networks to Reinforcement Learning
Implementing state evaluation with a function with parameters: Value Function Approximation
Applying deep learning to state evaluation: Deep Q-Network
Implementing a strategy with a parameterized function: Policy Gradient
Applying deep learning to strategy: Advantage Actor Critic (A2C)
State evaluation or strategy?

Day5 Weaknesses of Reinforcement Learning

Sample inefficiency
Falling into locally optimal behavior, often overlearning
Low reproducibility
Countermeasures based on weaknesses

Day6 Methods for overcoming weaknesses of reinforcement learning

Addressing sample inefficiency: use with model-based / representation learning
Dealing with low reproducibility: evolutionary strategies
Coping with locally optimal behavior/overlearning: Imitation learning/inverse reinforcement learning described in “Overview of Inverse Reinforcement Learning and Examples of Algorithms and Implementations“

Day7 Areas of application of reinforcement learning

Optimization of behavior
Learning optimization