Reinforcement Learning
1 Reinforcement Learning
This section covers the comprehensive landscape of reinforcement learning, from theoretical foundations to practical applications in control and decision-making.
- Reinforcement learning: A survey, Kaelbling L. et al. (1996).
1.1 Theory
1.1.1 Generative Model
QVIOn the Sample Complexity of Reinforcement Learning with a Generative Model, Azar M., Munos R., Kappen B. (2012).- Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal, Agarwal A. et al. (2019).
1.1.2 Policy Gradient
- Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton R. et al (2000).
- Approximately Optimal Approximate Reinforcement Learning, Kakade S., Langford J. (2002).
- On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift, Agarwal A. et al. (2019)
- PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning, Agarwal A. et al. (2020)
- Is the Policy Gradient a Gradient?, Nota C., Thomas P. S. (2020).
1.1.3 Linear Systems
- PAC Adaptive Control of Linear Systems, Fiechter C.-N. (1997)
OFU-LQRegret Bounds for the Adaptive Control of Linear Quadratic Systems, Abbasi-Yadkori Y., Szepesvari C. (2011).TS-LQImproved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems, Abeille M., Lazaric A. (2018).
1.2 Value-based Methods
DQNPlaying Atari with Deep Reinforcement Learning, Mnih V. et al. (2013). ๐๏ธDDQNDeep Reinforcement Learning with Double Q-learning, van Hasselt H., Silver D. et al. (2015).RainbowRainbow: Combining Improvements in Deep Reinforcement Learning, Hessel M. et al. (2017).
1.3 Policy-based Methods
1.3.1 Policy Gradient
REINFORCESimple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams R. (1992).TRPOTrust Region Policy Optimization, Schulman J. et al. (2015). ๐๏ธPPOProximal Policy Optimization Algorithms, Schulman J. et al. (2017). ๐๏ธ
1.3.2 Actor-Critic
DDPGContinuous Control With Deep Reinforcement Learning, Lillicrap T. et al. (2015).A3CAsynchronous Methods for Deep Reinforcement Learning, Mnih V. et al 2016.SACSoft Actor-Critic : Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja T. et al. (2018). ๐๏ธ
1.4 Model-based Methods
PILCOPILCO: A Model-Based and Data-Efficient Approach to Policy Search, Deisenroth M., Rasmussen C. (2011).MPPIInformation Theoretic MPC for Model-Based Reinforcement Learning, Williams G. et al. (2017). :octocat: ๐๏ธMuZeroMastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Schrittwiese J. et al. (2019). :octocat:
1.5 Exploration
HERHindsight Experience Replay, Andrychowicz M. et al. (2017). ๐๏ธRNDExploration by Random Network Distillation, Burda Y. et al. (OpenAI) (2018). ๐๏ธGo-ExploreGo-Explore: a New Approach for Hard-Exploration Problems, Ecoffet A. et al. (Uber) (2018). ๐๏ธ
1.6 Multi-agent RL
MADDPGMulti-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe R. et al (2017). :octocat:FTWHuman-level performance in first-person multiplayer games with population-based deep reinforcement learning, Jaderberg M. et al. (2018). ๐๏ธMAPPOThe Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games, Yu C. et al. (2021). :octocat:
1.7 Safe Reinforcement Learning
- A Comprehensive Survey on Safe Reinforcement Learning, Garcรญa J., Fernรกndez F. (2015).
CPOConstrained Policy Optimization, Achiam J., Held D., Tamar A., Abbeel P. (2017). :octocat:- Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning, Lukas Bronke et al. (2021). :octocat:
1.8 Transfer Learning and Meta-Learning
MAMLModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn C., Abbeel P., Levine S. (2017). ๐๏ธ- Sim-to-Real: Learning Agile Locomotion For Quadruped Robots, Tan J. et al. (2018). ๐๏ธ
- Learning Dexterous In-Hand Manipulation, OpenAI (2018). ๐๏ธ
1.9 Hierarchical RL
OCThe Option-Critic Architecture, Bacon P-L., Harb J., Precup D. (2016).FuNsFeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets A. et al. (2017).DeepLocoDeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning, Peng X. et al. (2017). ๐๏ธ
1.10 Offline RL
CQLConservative Q-Learning for Offline Reinforcement Learning, Kumar A. et al. (2020).- Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen L., Lu K. et al. (2021). :octocat:
Note
This section provides a comprehensive overview of reinforcement learning approaches relevant to safe and optimal control. For the complete list of papers and more detailed subsections, please refer to the original survey document.