Reinforcement Learning

1 Reinforcement Learning

This section covers the comprehensive landscape of reinforcement learning, from theoretical foundations to practical applications in control and decision-making.

Reinforcement learning: A survey, Kaelbling L. et al. (1996).

1.1 Theory

1.2 Value-based Methods

DQN Playing Atari with Deep Reinforcement Learning, Mnih V. et al. (2013). 🎞️
DDQN Deep Reinforcement Learning with Double Q-learning, van Hasselt H., Silver D. et al. (2015).
Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel M. et al. (2017).

1.3 Policy-based Methods

1.3.1 Policy Gradient

REINFORCE Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams R. (1992).
TRPO Trust Region Policy Optimization, Schulman J. et al. (2015). 🎞️
PPO Proximal Policy Optimization Algorithms, Schulman J. et al. (2017). 🎞️

1.3.2 Actor-Critic

DDPG Continuous Control With Deep Reinforcement Learning, Lillicrap T. et al. (2015).
A3C Asynchronous Methods for Deep Reinforcement Learning, Mnih V. et al 2016.
SAC Soft Actor-Critic : Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja T. et al. (2018). 🎞️

1.4 Model-based Methods

PILCO PILCO: A Model-Based and Data-Efficient Approach to Policy Search, Deisenroth M., Rasmussen C. (2011).
MPPI Information Theoretic MPC for Model-Based Reinforcement Learning, Williams G. et al. (2017). :octocat: 🎞️
MuZero Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Schrittwiese J. et al. (2019). :octocat:

1.5 Exploration

HER Hindsight Experience Replay, Andrychowicz M. et al. (2017). 🎞️
RND Exploration by Random Network Distillation, Burda Y. et al. (OpenAI) (2018). 🎞️
Go-Explore Go-Explore: a New Approach for Hard-Exploration Problems, Ecoffet A. et al. (Uber) (2018). 🎞️

1.6 Multi-agent RL

MADDPG Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe R. et al (2017). :octocat:
FTW Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, Jaderberg M. et al. (2018). 🎞️
MAPPO The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games, Yu C. et al. (2021). :octocat:

1.7 Safe Reinforcement Learning

A Comprehensive Survey on Safe Reinforcement Learning, García J., Fernández F. (2015).
CPO Constrained Policy Optimization, Achiam J., Held D., Tamar A., Abbeel P. (2017). :octocat:
Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning, Lukas Bronke et al. (2021). :octocat:

1.8 Transfer Learning and Meta-Learning

MAML Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn C., Abbeel P., Levine S. (2017). 🎞️
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots, Tan J. et al. (2018). 🎞️
Learning Dexterous In-Hand Manipulation, OpenAI (2018). 🎞️

1.9 Hierarchical RL

OC The Option-Critic Architecture, Bacon P-L., Harb J., Precup D. (2016).
FuNs FeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets A. et al. (2017).
DeepLoco DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning, Peng X. et al. (2017). 🎞️

1.10 Offline RL

CQL Conservative Q-Learning for Offline Reinforcement Learning, Kumar A. et al. (2020).
Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen L., Lu K. et al. (2021). :octocat:

Note

This section provides a comprehensive overview of reinforcement learning approaches relevant to safe and optimal control. For the complete list of papers and more detailed subsections, please refer to the original survey document.

1 Reinforcement Learning

1.1 Theory

1.1.1 Generative Model

1.1.2 Policy Gradient

1.1.3 Linear Systems

1.2 Value-based Methods

1.3 Policy-based Methods

1.3.1 Policy Gradient

1.3.2 Actor-Critic

1.4 Model-based Methods

1.5 Exploration

1.6 Multi-agent RL

1.7 Safe Reinforcement Learning

1.8 Transfer Learning and Meta-Learning

1.9 Hierarchical RL

1.10 Offline RL