強化学習 | Page 7 | Deus Ex Machina

Protected: Implementation of two approaches to improve environmental awareness, a weak point of deep reinforcement learning.

Implementation of two approaches to improve environment awareness, a weakness of deep reinforcement learning used in digital transformation, artificial intelligence, and machine learning tasks (inverse predictive, constrained, representation learning, imitation learning, reconstruction, predictive, WorldModels, transition function, reward function Weaknesses of representation learning, VAE, Vision Model, RNN, Memory RNN, Monte Carlo methods, TD Search, Monte Carlo Tree Search, Model-based learning, Dyna, Deep Reinforcement Learning)

2023.04.27

アルゴリズム:Algorithmsグラフ理論スパースモデリングマルチエージェントシステム幾何学:Geometry強化学習微分積分:Calculus数理論理学:Mathematical logic最適化:Optimization機械学習:Machine Learning深層学習:Deep Learning確率・統計:Probability and Statistics

Protected: Overview of Weaknesses and Countermeasures in Deep Reinforcement Learning and Two Approaches to Improve Environment Recognition

An overview of the weaknesses and countermeasures of deep reinforcement learning utilized in digital transformation, artificial intelligence, and machine learning tasks and two approaches of improving environmental awareness Mixture Density Network, RNN, Variational Auto Encoder, World Modles, Expression Learning, Strategy Network Compression, Model Free Learning, Sample-Based Planning Model, Dyna, Simulation-Based, Sample-Based, Gaussian Process, Neural Network, Transition Function, Reward Function) World Modles, Representation Learning, Strategy Network Compression, Model-Free Learning, Sample-Based Planning Model, Dyna, Simulation-Based, Sample-Based, Gaussian Process, Neural Network, Transition Function, Reward Function, Simulator , learning capability, transition capability

2023.04.13

アルゴリズム:Algorithmsグラフ理論スパースモデリング幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning深層学習:Deep Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Protected: Value Assessment and Policy and Weaknesses in Deep Reinforcement Learning

Value assessment and strategies and weaknesses in deep reinforcement learning used for digital transformation, artificial intelligence, and machine learning tasks poor sample efficiency, difficulty in validating methods as well, impact of implementation practices on performance, library initial values, poor reproducibility, over-training, local optimum, dexterity, TRPO, PPO, continuous value control, image control, policy-based, value-based

2023.03.30

アルゴリズム:Algorithmsグラフ理論スパースモデリング幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning深層学習:Deep Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Protected: Linear Bandit, Contextual Bandit, Linear Bandit Problem with LinUCB Policies

Linear Bandit, Contextual Bandit, LineUCB policy for linear bandit problems (Riglet, algorithm, least squares quantification, LinUCB score, reward expectation, point estimate, knowledge) utilized in digital transformation, artificial intelligence, machine learning tasks utilization-oriented measures, search-oriented measures, Woodbury's formula, LinUCB measures, LinUCB policy, contextual bandit, website optimization, maximum sales expectation, bandit optimal budget allocation)

2023.03.24

アルゴリズム:Algorithmsグラフ理論スパースモデリングバンディッド問題幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Protected: TRPO/PPO and DPG/DDPG, an improvement of the Policy Gradient method of reinforcement learning

TRPO/PPO and DPG/DDPG (Pendulum, Actor Critic, SequentialMemory, SequentialMemory, and SequentialMemory), which are improvements of Policy Gradient methods of reinforcement learning used for digital transformation, artificial intelligence, and machine learning tasks. Adam, keras-rl, TD error, Deep Deterministic Policy Gradient, Deterministic Policy Gradient, Advanced Actor Critic, A2C, A3C, Proximal Policy Optimization, Trust Region Policy Optimization, Python)

2023.03.16

アルゴリズム:Algorithmsグラフ理論スパースモデリング幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning深層学習:Deep Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Protected: Applying Neural Networks to Reinforcement Learning Applying Deep Learning to Strategy:Advanced Actor Critic (A2C)

Application of Neural Networks to Reinforcement Learning for Digital Transformation, Artificial Intelligence, and Machine Learning tasks Implementation of Advanced Actor Critic (A2C) applying deep learning to strategies (Policy Gradient method, Q-learning, Gumbel Max Trix, A3C (Asynchronous Advantage Actor Critic))

2023.03.02

アルゴリズム:Algorithmsグラフ理論スパースモデリング幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning深層学習:Deep Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Protected: Application of Neural Networks to Reinforcement Learning Policy Gradient, which implements a strategy with a function with parameters.

Application of Neural Networks to Reinforcement Learning for Digital Transformation, Artificial Intelligence, and Machine Learning tasks Policy Gradient to implement strategies with parameterized functions (discounted present value, strategy update, tensorflow, and Keras, CartPole, ACER, Actor Critoc with Experience Replay, Off-Policy Actor Critic, behavior policy, Deterministic Policy Gradient, DPG, DDPG, and Experience Replay, Bellman Equation, policy gradient method, action history)

2023.02.16

アルゴリズム:Algorithmsグラフ理論スパースモデリングマルチエージェントシステム幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Protected: Exp3.P measures and lower bounds for the adversarial multi-armed bandit problem Theoretical overview

Theoretical overview of Exp3.P measures and lower bounds for adversarial multi-arm bandit problems utilized in digital transformation, artificial intelligence, and machine learning tasks cumulative reward, Poly INF measures, algorithms, Arbel-Ruffini theorem, pseudo-riglet upper bounds for Poly INF measures, closed-form expressions, continuous differentiable functions, Audibert, Bubeck, INF measures, pseudo-riglet upper bounds for INF measures, random choice algorithms, optimal order measures, highly probable riglet upper bounds) closed form, continuous differentiable functions, Audibert, Bubeck, INF measures, pseudo-riglet lower bounds, random choice algorithms, measures of optimal order, highly probable riglet upper bounds

2023.02.10

アルゴリズム:Algorithmsオンライン学習スパースモデリングバンディッド問題幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Theory and algorithms of various reinforcement learning techniques and their implementation in python

Theory and algorithms of various reinforcement learning techniques used for digital transformation, artificial intelligence, and machine learning tasks and their implementation in python reinforcement learning,online learning,online prediction,deep learning,python,algorithm,theory,implementation

2023.02.05

アルゴリズム:Algorithmsオンライン学習グラフ理論スパースモデリング幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning深層学習:Deep Learning確率・統計:Probability and Statistics線形代数:Linear Algebra

Protected: Applying Neural Networks to Reinforcement Learning Deep Q-Network Applying Deep Learning to Value Assessment

Application of Neural Networks to Reinforcement Learning for Digital Transformation, Artificial Intelligence, and Machine Learning tasks Deep Q-Network Prioritized Replay, Multi-step applying deep learning to value assessment Deep Q-Network applying deep learning to value assessment (Prioritized Replay, Multi-step Learning, Distibutional RL, Noisy Nets, Double DQN, Dueling Network, Rainbow, GPU, Epsilon-Greedy method, Optimizer, Reward Clipping, Fixed Target Q-Network, Experience Replay, Average Experience Replay, Mean Square Error, Mean Squared Error, TD Error, PyGame Learning Enviroment, PLE, OpenAI Gym, CNN

2023.02.02

pythonアルゴリズム:Algorithmsグラフ理論スパースモデリング幾何学:Geometry強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning深層学習:Deep Learning確率・統計:Probability and Statistics線形代数:Linear Algebra集合論:Set theory