Reinforcement Learning
1 Reinforcement Learning
This section covers the comprehensive landscape of reinforcement learning, from theoretical foundations to practical applications in control and decision-making.
- Reinforcement learning: A survey, Kaelbling L. et al. (1996).
1.1 Theory
1.1.1 Generative Model
QVI
On the Sample Complexity of Reinforcement Learning with a Generative Model, Azar M., Munos R., Kappen B. (2012).- Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal, Agarwal A. et al. (2019).
1.1.2 Policy Gradient
- Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton R. et al (2000).
- Approximately Optimal Approximate Reinforcement Learning, Kakade S., Langford J. (2002).
- On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift, Agarwal A. et al. (2019)
- PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning, Agarwal A. et al. (2020)
- Is the Policy Gradient a Gradient?, Nota C., Thomas P. S. (2020).
1.1.3 Linear Systems
- PAC Adaptive Control of Linear Systems, Fiechter C.-N. (1997)
OFU-LQ
Regret Bounds for the Adaptive Control of Linear Quadratic Systems, Abbasi-Yadkori Y., Szepesvari C. (2011).TS-LQ
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems, Abeille M., Lazaric A. (2018).
1.2 Value-based Methods
DQN
Playing Atari with Deep Reinforcement Learning, Mnih V. et al. (2013). ๐๏ธDDQN
Deep Reinforcement Learning with Double Q-learning, van Hasselt H., Silver D. et al. (2015).Rainbow
Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel M. et al. (2017).
1.3 Policy-based Methods
1.3.1 Policy Gradient
REINFORCE
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams R. (1992).TRPO
Trust Region Policy Optimization, Schulman J. et al. (2015). ๐๏ธPPO
Proximal Policy Optimization Algorithms, Schulman J. et al. (2017). ๐๏ธ
1.3.2 Actor-Critic
DDPG
Continuous Control With Deep Reinforcement Learning, Lillicrap T. et al. (2015).A3C
Asynchronous Methods for Deep Reinforcement Learning, Mnih V. et al 2016.SAC
Soft Actor-Critic : Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja T. et al. (2018). ๐๏ธ
1.4 Model-based Methods
PILCO
PILCO: A Model-Based and Data-Efficient Approach to Policy Search, Deisenroth M., Rasmussen C. (2011).MPPI
Information Theoretic MPC for Model-Based Reinforcement Learning, Williams G. et al. (2017). :octocat: ๐๏ธMuZero
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Schrittwiese J. et al. (2019). :octocat:
1.5 Exploration
HER
Hindsight Experience Replay, Andrychowicz M. et al. (2017). ๐๏ธRND
Exploration by Random Network Distillation, Burda Y. et al. (OpenAI) (2018). ๐๏ธGo-Explore
Go-Explore: a New Approach for Hard-Exploration Problems, Ecoffet A. et al. (Uber) (2018). ๐๏ธ
1.6 Multi-agent RL
MADDPG
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe R. et al (2017). :octocat:FTW
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, Jaderberg M. et al. (2018). ๐๏ธMAPPO
The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games, Yu C. et al. (2021). :octocat:
1.7 Safe Reinforcement Learning
- A Comprehensive Survey on Safe Reinforcement Learning, Garcรญa J., Fernรกndez F. (2015).
CPO
Constrained Policy Optimization, Achiam J., Held D., Tamar A., Abbeel P. (2017). :octocat:- Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning, Lukas Bronke et al. (2021). :octocat:
1.8 Transfer Learning and Meta-Learning
MAML
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn C., Abbeel P., Levine S. (2017). ๐๏ธ- Sim-to-Real: Learning Agile Locomotion For Quadruped Robots, Tan J. et al. (2018). ๐๏ธ
- Learning Dexterous In-Hand Manipulation, OpenAI (2018). ๐๏ธ
1.9 Hierarchical RL
OC
The Option-Critic Architecture, Bacon P-L., Harb J., Precup D. (2016).FuNs
FeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets A. et al. (2017).DeepLoco
DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning, Peng X. et al. (2017). ๐๏ธ
1.10 Offline RL
CQL
Conservative Q-Learning for Offline Reinforcement Learning, Kumar A. et al. (2020).- Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen L., Lu K. et al. (2021). :octocat:
Note
This section provides a comprehensive overview of reinforcement learning approaches relevant to safe and optimal control. For the complete list of papers and more detailed subsections, please refer to the original survey document.