オンライン学習 Protected: Trade-off between exploration and utilization -Regret and stochastic optimal measures, heuristics Reinforcement learning with regrets, stochastic optimal measures, and heuristics 2022.01.19 オンライン学習強化学習微分積分:Calculus最適化:Optimization機械学習:Machine Learning確率・統計:Probability and Statistics