Approximate Dynamic Programming

计算机科学 程序设计语言
作者
Warren B. Powell
出处
期刊:John Wiley & Sons, Inc. eBooks [Wiley]
被引量:1328
标识
DOI:10.1002/9781118029176
摘要

Preface. Acknowledgments. 1. The challenges of dynamic programming. 1.1 A dynamic programming example: a shortest path problem. 1.2 The three curses of dimensionality. 1.3 Some real applications. 1.4 Problem classes. 1.5 The many dialects of dynamic programming. 1.6 What is new in this book? 1.7 Bibliographic notes. 2. Some illustrative models. 2.1 Deterministic problems. 2.2 Stochastic problems. 2.3 Information acquisition problems. 2.4 A simple modeling framework for dynamic programs. 2.5 Bibliographic notes. Problems. 3. Introduction to Markov decision processes. 3.1 The optimality equations. 3.2 Finite horizon problems. 3.3 Infinite horizon problems. 3.4 Value iteration. 3.5 Policy iteration. 3.6 Hybrid valuepolicy iteration. 3.7 The linear programming method for dynamic programs. 3.8 Monotone policies. 3.9 Why does it work? 3.10 Bibliographic notes. Problems 4. Introduction to approximate dynamic programming. 4.1 The three curses of dimensionality (revisited). 4.2 The basic idea. 4.3 Sampling random variables . 4.4 ADP using the postdecision state variable. 4.5 Lowdimensional representations of value functions. 4.6 So just what is approximate dynamic programming? 4.7 Experimental issues. 4.8 Dynamic programming with missing or incomplete models. 4.9 Relationship to reinforcement learning. 4.10 But does it work? 4.11 Bibliographic notes. Problems. 5. Modeling dynamic programs. 5.1 Notational style. 5.2 Modeling time. 5.3 Modeling resources. 5.4 The states of our system. 5.5 Modeling decisions. 5.6 The exogenous information process. 5.7 The transition function. 5.8 The contribution function. 5.9 The objective function. 5.10 A measuretheoretic view of information. 5.11 Bibliographic notes. Problems. 6. Stochastic approximation methods. 6.1 A stochastic gradient algorithm. 6.2 Some stepsize recipes. 6.3 Stochastic stepsizes. 6.4 Computing bias and variance. 6.5 Optimal stepsizes. 6.6 Some experimental comparisons of stepsize formulas. 6.7 Convergence. 6.8 Why does it work? 6.9 Bibliographic notes. Problems. 7. Approximating value functions. 7.1 Approximation using aggregation. 7.2 Approximation methods using regression models. 7.3 Recursive methods for regression models. 7.4 Neural networks. 7.5 Batch processes. 7.6 Why does it work? 7.7 Bibliographic notes. Problems. 8. ADP for finite horizon problems. 8.1 Strategies for finite horizon problems. 8.2 Qlearning. 8.3 Temporal difference learning. 8.4 Policy iteration. 8.5 Monte Carlo value and policy iteration. 8.6 The actorcritic paradigm. 8.7 Bias in value function estimation. 8.8 State sampling strategies. 8.9 Starting and stopping. 8.10 A taxonomy of approximate dynamic programming strategies. 8.11 Why does it work? 8.12 Bibliographic notes. Problems. 9. Infinite horizon problems. 9.1 From finite to infinite horizon. 9.2 Algorithmic strategies. 9.3 Stepsizes for infinite horizon problems. 9.4 Error measures. 9.5 Direct ADP for online applications. 9.6 Finite horizon models for steady state applications. 9.7 Why does it work? 9.8 Bibliographic notes. Problems. 10. Exploration vs. exploitation. 10.1 A learning exercise: the nomadic trucker. 10.2 Learning strategies. 10.3 A simple information acquisition problem. 10.4 Gittins indices and the information acquisition problem. 10.5 Variations. 10.6 The knowledge gradient algorithm. 10.7 Information acquisition in dynamic programming. 10.8 Bibliographic notes. Problems. 11. Value function approximations for special functions. 11.1 Value functions versus gradients. 11.2 Linear approximations. 11.3 Piecewise linear approximations. 11.4 The SHAPE algorithm. 11.5 Regression methods. 11.6 Cutting planes. 11.7 Why does it work? 11.8 Bibliographic notes. Problems. 12. Dynamic resource allocation. 12.1 An asset acquisition problem. 12.2 The blood management problem. 12.3 A portfolio optimization problem. 12.4 A general resource allocation problem. 12.5 A fleet management problem. 12.6 A driver management problem. 12.7 Bibliographic references. Problems. 13. Implementation challenges. 13.1 Will ADP work for your problem? 13.2 Designing an ADP algorithm for complex problems. 13.3 Debugging an ADP algorithm. 13.4 Convergence issues. 13.5 Modeling your problem. 13.6 Online vs. offline models. 13.7 If it works, patent it!
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
1秒前
桐桐应助Jackcaosky采纳,获得20
1秒前
2秒前
小巧大山发布了新的文献求助10
2秒前
3秒前
zzb发布了新的文献求助10
3秒前
绫艾完成签到,获得积分10
4秒前
EoGangPe发布了新的文献求助10
4秒前
4秒前
纪星星发布了新的文献求助10
4秒前
4秒前
阿鱼完成签到,获得积分10
6秒前
6秒前
DanYang发布了新的文献求助10
6秒前
田様应助伟大的娃娃采纳,获得10
6秒前
6秒前
ZHOUMOU发布了新的文献求助10
6秒前
旅程发布了新的文献求助10
6秒前
黄金矿工发布了新的文献求助10
7秒前
万能图书馆应助yuuta采纳,获得10
7秒前
orixero应助等风来采纳,获得10
7秒前
在水一方应助sui采纳,获得10
8秒前
bkagyin应助gehao采纳,获得10
8秒前
z今晚吃哥斯拉1完成签到 ,获得积分10
8秒前
SYLH应助xiixix采纳,获得20
8秒前
9秒前
yooloo发布了新的文献求助30
9秒前
bkagyin应助djh采纳,获得10
9秒前
慕青应助hetao286采纳,获得10
9秒前
从容芸发布了新的文献求助10
10秒前
DanYang完成签到,获得积分10
11秒前
科研通AI5应助dida采纳,获得30
11秒前
乐乐猫完成签到,获得积分10
12秒前
阿鱼发布了新的文献求助10
12秒前
糊涂的雁易应助好运来采纳,获得10
12秒前
奋斗初南完成签到,获得积分10
12秒前
cctv18应助芜6采纳,获得30
12秒前
李健应助小鱼采纳,获得10
13秒前
无限翅膀完成签到,获得积分10
13秒前
高分求助中
Algorithmic Mathematics in Machine Learning 500
Advances in Underwater Acoustics, Structural Acoustics, and Computational Methodologies 400
Getting Published in SSCI Journals: 200+ Questions and Answers for Absolute Beginners 300
Fatigue of Materials and Structures 260
The Monocyte-to-HDL ratio (MHR) as a prognostic and diagnostic biomarker in Acute Ischemic Stroke: A systematic review with meta-analysis (P9-14.010) 240
The Burge and Minnechaduza Clarendonian mammalian faunas of north-central Nebraska 206
Statistical Analysis with Missing Data 200
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3832287
求助须知:如何正确求助?哪些是违规求助? 3374638
关于积分的说明 10485696
捐赠科研通 3094405
什么是DOI,文献DOI怎么找? 1703517
邀请新用户注册赠送积分活动 819480
科研通“疑难数据库(出版商)”最低求助积分说明 771587