Optimal Control

1 Optimal Control

This section covers fundamental approaches to optimal control, including dynamic programming, linear programming, tree-based planning, control theory, and model predictive control.

1.1 Dynamic Programming

(book) Dynamic Programming, Bellman R. (1957).
(book) Dynamic Programming and Optimal Control, Volumes 1 and 2, Bertsekas D. (1995).
(book) Markov Decision Processes - Discrete Stochastic Dynamic Programming, Puterman M. (1995).
An Upper Bound on the Loss from Approximate Optimal-Value Functions, Singh S., Yee R. (1994).
Stochastic optimization of sailing trajectories in an upwind regatta, Dalang R. et al. (2015).

1.2 Linear Programming

(book) Markov Decision Processes - Discrete Stochastic Dynamic Programming, Puterman M. (1995).
REPS Relative Entropy Policy Search, Peters J. et al. (2010).

1.3 Tree-Based Planning

ExpectiMinimax Optimal strategy in games with chance nodes, Melkó E., Nagy B. (2007).
Sparse sampling A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Kearns M. et al. (2002).
MCTS Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Rémi Coulom, SequeL (2006).
UCT Bandit based Monte-Carlo Planning, Kocsis L., Szepesvári C. (2006).
Bandit Algorithms for Tree Search, Coquelin P-A., Munos R. (2007).
OPD Optimistic Planning for Deterministic Systems, Hren J., Munos R. (2008).
OLOP Open Loop Optimistic Planning, Bubeck S., Munos R. (2010).
SOOP Optimistic Planning for Continuous-Action Deterministic Systems, Buşoniu L. et al. (2011).
OPSS Optimistic planning for sparsely stochastic systems, L. Buşoniu, R. Munos, B. De Schutter, and R. Babuska (2011).
HOOT Sample-Based Planning for Continuous ActionMarkov Decision Processes, Mansley C., Weinstein A., Littman M. (2011).
HOLOP Bandit-Based Planning and Learning inContinuous-Action Markov Decision Processes, Weinstein A., Littman M. (2012).
BRUE Simple Regret Optimization in Online Planning for Markov Decision Processes, Feldman Z. and Domshlak C. (2014).
LGP Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning, Toussaint M. (2015). 🎞️
AlphaGo Mastering the game of Go with deep neural networks and tree search, Silver D. et al. (2016).
AlphaGo Zero Mastering the game of Go without human knowledge, Silver D. et al. (2017).
AlphaZero Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver D. et al. (2017).
TrailBlazer Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R. (2017).
MCTSnets Learning to search with MCTSnets, Guez A. et al. (2018).
ADI Solving the Rubik’s Cube Without Human Knowledge, McAleer S. et al. (2018).
OPC/SOPC Continuous-action planning for discounted inﬁnite-horizon nonlinear optimal control with Lipschitz values, Buşoniu L., Pall E., Munos R. (2018).
Real-time tree search with pessimistic scenarios: Winning the NeurIPS 2018 Pommerman Competition, Osogami T., Takahashi T. (2019)

1.4 Control Theory

(book) The Mathematical Theory of Optimal Processes, L. S. Pontryagin, Boltyanskii V. G., Gamkrelidze R. V., and Mishchenko E. F. (1962).
(book) Constrained Control and Estimation, Goodwin G. (2005).
PI² A Generalized Path Integral Control Approach to Reinforcement Learning, Theodorou E. et al. (2010).
PI²-CMA Path Integral Policy Improvement with Covariance Matrix Adaptation, Stulp F., Sigaud O. (2010).
iLQG A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems, Todorov E. (2005). :octocat:
iLQG+ Synthesis and stabilization of complex behaviors through online trajectory optimization, Tassa Y. (2012).

1.5 Model Predictive Control

(book) Model Predictive Control, Camacho E. (1995).
(book) Predictive Control With Constraints, Maciejowski J. M. (2002).
Linear Model Predictive Control for Lane Keeping and Obstacle Avoidance on Low Curvature Roads, Turri V. et al. (2013).
MPCC Optimization-based autonomous racing of 1:43 scale RC cars, Liniger A. et al. (2014). 🎞️ | 🎞️
MIQP Optimal trajectory planning for autonomous driving integrating logical constraints: An MIQP perspective, Qian X., Altché F., Bender P., Stiller C. de La Fortelle A. (2016).