草莓干 发表于 2020-12-25 15:49:31

OpenAI Gym 入门教程

本帖最后由 草莓干 于 2020-12-25 15:49 编辑 <br /><br /><section style="width:100%;margin:0px auto;padding:0px;" data-width="100%"><section style="width: 100%;border:1px solid rgb(173,218,204);" data-width="100%"><section style="padding:3px;"><section data-bgless="spin" data-bglessp="120" data-bgopacity="10%" style="color: rgb(63,62,63);background:rgb(247,251,249);"><section class="135brush" style="font-size: 14px;text-align: justify;letter-spacing: 1px;line-height: 25px;padding:0.5em;"><p style="white-space: normal; line-height: 1.75em;">除了试图直接去建立一个可以模拟成人大脑的程序之外,&nbsp;为什么不试图建立一个可以模拟小孩大脑的程序呢?如果它接受适当的教育,就会获得成人的大脑。</p><p style="white-space: normal; line-height: 1.75em; text-align: right;">&nbsp;— 阿兰·图灵</p></section></section></section></section></section><h2 style="line-height: 1.75em;"></h2><p><br style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important;"/></p><section style="margin-top: 10px;margin-bottom: 10px;text-align: center;font-size: 18px;box-sizing: border-box;"><section style="display: inline-block;border-radius: 6px;line-height: 1.6em;background-color: rgb(169, 218, 228);box-sizing: border-box;"><section style="border-radius: 6px 0px 0px 6px;display: inline-block;vertical-align: top;width: 1.6em;height: 1.6em;color: rgb(255, 255, 255);background-color: rgb(248, 205, 208);box-sizing: border-box;"><p style="box-sizing: border-box;">01</p></section><section style="padding-right: 10px;padding-left: 10px;display: inline-block;vertical-align: top;color: rgb(255, 255, 255);font-size: 15px;line-height: 2;letter-spacing: 2px;box-sizing: border-box;"><p style="box-sizing: border-box;"><strong>介绍</strong></p></section></section></section><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei;">强化学习&nbsp;(Reinforcement learning) 是机器学习的一个子领域用于制定决策和运动自由度控制。强化学习主要研究在复杂未知的环境中,智体(agent)实现某个目标。强化学习最引人入胜</span><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">的两个特点是:</span></p><p style="line-height: 1.75em;"><strong><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">1.&nbsp;强化学习非常通用,可以用来解决需要作出一些列决策的所有问题:</span></strong><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">&nbsp;例如,训练机器人跑步和弹跳,制定商品价格和库存管理,玩&nbsp;Atari 游戏和棋盘游戏等等。</span></p><p style="line-height: 1.75em;"><strong><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">2.&nbsp;强化学习已经可以在许多复杂的环境中取得较好的实验结果:</span></strong><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">例如&nbsp;Deep RL 的 Alpha Go等。</span></p><p style="line-height: 1.75em;"><br/></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Gym&nbsp;是一个研究和开发强化学习相关算法的仿真平台。</span></p><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">无需智体先验知识;</span></p></li><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">兼容常见的数值运算库如&nbsp;TensorFlow、Theano 等</span></p></li></ul><p style="line-height: 1.75em;"><br/></p><p><strong><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Gym 的一个最小例子&nbsp;CartPole-v0</span></strong></p><p style="text-align: center;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><br/></span></p><p><strong><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">运行效果</span></strong></p><p style="text-align: center;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><br/></span></p><p style="line-height: 1.75em;"><span style="color: rgb(26, 26, 26); font-family: 微软雅黑; text-indent: 30px; background-color: rgb(255, 255, 255); font-size: 14px;">至此,第一个 Hello world 就算正式地跑起来了!</span></p><p><span style="color: rgb(26, 26, 26); font-family: 微软雅黑; font-size: 15px; text-indent: 30px; background-color: rgb(255, 255, 255);"><br/></span></p><p><span style="color: rgb(26, 26, 26); font-family: 微软雅黑; font-size: 15px; text-indent: 30px; background-color: rgb(255, 255, 255);"><br/></span></p><p><span style="color: rgb(26, 26, 26); font-family: 微软雅黑; font-size: 15px; text-indent: 30px; background-color: rgb(255, 255, 255);"></span></p><section style="margin-top: 10px;margin-bottom: 10px;text-align: center;font-size: 18px;box-sizing: border-box;"><section style="display: inline-block;border-radius: 6px;line-height: 1.6em;background-color: rgb(169, 218, 228);box-sizing: border-box;"><section style="border-radius: 6px 0px 0px 6px;display: inline-block;vertical-align: top;width: 1.6em;height: 1.6em;color: rgb(255, 255, 255);background-color: rgb(248, 205, 208);box-sizing: border-box;"><p style="box-sizing: border-box;">02</p></section><section style="padding-right: 10px;padding-left: 10px;display: inline-block;vertical-align: top;color: rgb(255, 255, 255);font-size: 15px;line-height: 2;letter-spacing: 2px;box-sizing: border-box;"><p style="box-sizing: border-box;"><strong style="box-sizing: border-box;">观测</strong>(Observations)&nbsp;</p></section></section></section><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">在第一个小栗子中,使用了&nbsp;env.step()&nbsp;函数来对每一步进行仿真,在 Gym 中,env.step()&nbsp;会返回 4 个参数:</span><br/><span style="color: rgb(26, 26, 26); font-family: 微软雅黑; font-size: 15px; text-indent: 30px; background-color: rgb(255, 255, 255);"></span></p><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">观测&nbsp;Observation (Object):当前 step 执行后,环境的观测(类型为对象)。例如,从相机获取的像素点,机器人各个关节的角度或棋盘游戏当前的状态等;</span></p></li><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">奖励&nbsp;Reward (Float): 执行上一步动作(action)后,智体(agent)获得的奖励(浮点类型),不同的环境中奖励值变化范围也不相同,但是强化学习的目标就是使得总奖励值最大;</span></p></li><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">完成&nbsp;Done (Boolen): 表示是否需要将环境重置&nbsp;env.reset。大多数情况下,当&nbsp;Done&nbsp;为True&nbsp;时,就表明当前回合(episode)或者试验(tial)结束。例如当机器人摔倒或者掉出台面,就应当终止当前回合进行重置(reset);</span></p></li><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">信息&nbsp;Info (Dict): 针对调试过程的诊断信息。在标准的智体仿真评估当中不会使用到这个 info,具体用到的时候再说。</span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p></li></ul><p style="line-height: 1.75em;"><br/></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">总结来说,这就是一个强化学习的基本流程,在每个时间点上,智体执行 action,环境返回上一次 action 的观测和奖励,用图表示为</span></p><p style="text-align: center;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span><br/></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><br/></span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">在&nbsp;Gym 仿真中,每一次回合开始,需要先执行&nbsp;</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">reset()</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;函数,返回初始观测信息,然后根据标志位&nbsp;</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">done</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;的状态,来决定是否进行下一次回合。代码表示为</span></span></p><p style="line-height: 1.75em; text-align: center;"><span style="font-size: 14px; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"></span></p><p style="line-height: 1.75em;"><span style="font-size: 14px; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"><br/></span></p><p style="line-height: 1.75em;"><span style="background: rgb(255, 255, 255); margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; box-sizing: border-box !important; overflow-wrap: break-word !important;">仿真截图如下:</span></p><p style="line-height: 1.75em; text-align: center;"><span style="background: rgb(255, 255, 255); margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; box-sizing: border-box !important; overflow-wrap: break-word !important;"></span></p><p style="line-height: 1.75em;"><span style="background: rgb(255, 255, 255); margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; box-sizing: border-box !important; overflow-wrap: break-word !important;"><br/></span></p><p style="line-height: 1.75em;"><span style="background: rgb(255, 255, 255); margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; box-sizing: border-box !important; overflow-wrap: break-word !important;"><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 34px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 15px; background: rgb(255, 255, 255);">每次&nbsp;</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 34px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 15px; background: rgb(246, 246, 246);">action</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 34px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 15px; background: rgb(255, 255, 255);">&nbsp;前,将上一次&nbsp;</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 34px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 15px; background: rgb(246, 246, 246);">observation</span><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 34px; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0px; font-size: 15px; background: rgb(255, 255, 255);">&nbsp;打印,可以得到打印日志如下</span></span></p><p style="text-align: center;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><br/></span></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><br/></span></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><section style="margin-top: 10px;margin-bottom: 10px;text-align: center;font-size: 18px;box-sizing: border-box;"><section style="display: inline-block;border-radius: 6px;line-height: 1.6em;background-color: rgb(169, 218, 228);box-sizing: border-box;"><section style="border-radius: 6px 0px 0px 6px;display: inline-block;vertical-align: top;width: 1.6em;height: 1.6em;color: rgb(255, 255, 255);background-color: rgb(248, 205, 208);box-sizing: border-box;"><p style="box-sizing: border-box;">03</p></section><section style="padding-right: 10px;padding-left: 10px;display: inline-block;vertical-align: top;color: rgb(255, 255, 255);font-size: 15px;line-height: 2;letter-spacing: 2px;box-sizing: border-box;"><p style="box-sizing: border-box;"><strong style="box-sizing: border-box;">空间(Spaces)&nbsp;</strong></p></section></section></section><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">在前面的两个小栗子中,每次执行的动作(action)都是从环境动作空间中随机进行选取的,但是这些动作 (action) 是什么?在 Gym 的仿真环境中,有运动间&nbsp;action_space&nbsp;和观测空间observation_space&nbsp;两个指标,程序中被定义为&nbsp;Space类型,用于描述有效的运动和观测的格式和范围。下面是一个代码示例:</span></p><p style="text-align: center;"></p><p style="line-height: 1.75em;"><span style="color: rgb(26, 26, 26); font-family: 微软雅黑, Microsoft YaHei; font-size: 14px; letter-spacing: 0pt; background-color: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt;"><br/></span></p><p style="line-height: 1.75em;"><span style="color: rgb(26, 26, 26); font-family: 微软雅黑, Microsoft YaHei; font-size: 14px; letter-spacing: 0pt; background-color: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt;">从程序运行结果来看</span></p><p style="line-height: 1.75em;"><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">·&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">action_space</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;是一个离散&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">Discrete</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;类型,从&nbsp;</span><span style="background: rgb(255, 255, 255); color: rgb(51, 51, 51); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">discrete.py&nbsp;</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; font-family: 微软雅黑; box-sizing: border-box !important; overflow-wrap: break-word !important;">源码可知,范围是一个</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">{0,1,...,n-1}</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;长度为&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">n</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;的非负整数集合,在&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">CartPole-v0</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;例子中,动作空间表示为</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">{0,1}</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; font-family: 微软雅黑; box-sizing: border-box !important; overflow-wrap: break-word !important;">。</span></p><p style="line-height: 1.75em;"><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">·&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">observation_space</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;是一个&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">Box</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;类型,从&nbsp;</span><span style="background: rgb(255, 255, 255); color: rgb(51, 51, 51); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">box.py&nbsp;</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px; margin: 0px; padding: 0px; max-width: 100%;">源码可知,表示一个</span>&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">n</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;维的盒子,所以在上一节打印出来的&nbsp;</span><span style="text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">observation</span><span style="background: rgb(255, 255, 255); text-align: justify; text-indent: 22.5pt; margin: 0px; padding: 0px; max-width: 100%; font-family: 微软雅黑; color: rgb(26, 26, 26); letter-spacing: 0pt; font-size: 11pt; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;是一个长度为 4 的数组。数组中的每个元素都具有上下界。</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">利用运动空间和观测空间的定义和范围,可以将代码写得更加通用。在许多仿真环境中,</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">Box</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;和</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">Discrete</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;是最常见的空间描述,在智体每次执行动作时,都属于这些空间范围内,代码示例为:</span></span></p><p style="line-height: 1.75em; text-align: center;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"></span></p><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"><br/></span></p><p style="line-height: 1.75em;"><span style="margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; box-sizing: border-box !important; overflow-wrap: break-word !important;"><span style="font-size: 14px; text-align: justify; margin: 0px; padding: 0px; max-width: 100%; text-indent: 30px; background: rgb(255, 255, 255); color: rgb(26, 26, 26); letter-spacing: 0pt;">在&nbsp;</span><span style="font-size: 14px; text-align: justify; margin: 0px; padding: 0px; max-width: 100%; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background: rgb(246, 246, 246);">CartPole-v0</span><span style="font-size: 14px; text-align: justify; margin: 0px; padding: 0px; max-width: 100%; text-indent: 30px; background: rgb(255, 255, 255); color: rgb(26, 26, 26); letter-spacing: 0pt;">&nbsp;栗子中,运动只能选择左和右,分别用&nbsp;</span><span style="font-size: 14px; text-align: justify; margin: 0px; padding: 0px; max-width: 100%; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background: rgb(246, 246, 246);">{0,1}</span><span style="font-size: 14px; text-align: justify; margin: 0px; padding: 0px; max-width: 100%; text-indent: 30px; background: rgb(255, 255, 255); color: rgb(26, 26, 26); letter-spacing: 0pt;">&nbsp;表示。</span></span></p><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"><br/></span></p><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"><br/></span></p><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"></span></p><section style="margin-top: 10px;margin-bottom: 10px;text-align: center;font-size: 18px;box-sizing: border-box;"><section style="display: inline-block;border-radius: 6px;line-height: 1.6em;background-color: rgb(169, 218, 228);box-sizing: border-box;"><section style="border-radius: 6px 0px 0px 6px;display: inline-block;vertical-align: top;width: 1.6em;height: 1.6em;color: rgb(255, 255, 255);background-color: rgb(248, 205, 208);box-sizing: border-box;"><p style="box-sizing: border-box;">04</p></section><section style="padding-right: 10px;padding-left: 10px;display: inline-block;vertical-align: top;color: rgb(255, 255, 255);font-size: 15px;line-height: 2;letter-spacing: 2px;box-sizing: border-box;"><p style="box-sizing: border-box;"><strong style="box-sizing: border-box;">Gym中可用的环境</strong> </p></section></section></section><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;"></span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Gym 中从简单到复杂,包含了许多经典的仿真环境和各种数据,其中包括:</span></p><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">经典控制和文字游戏:经典的强化学习示例,方便入门;</span></p></li><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">算法:从例子中学习强化学习的相关算法,在&nbsp;Gym 的仿真算法中,由易到难方便新手入坑;</span></p></li><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">雅达利游戏:利用强化学习来玩雅达利的游戏。Gym 中集成了对强化学习有着重要影响的Arcade Learning Environment,并且方便用户安装;</span></p></li><li><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">2D 和 3D 的机器人:这个是我一直很感兴趣的一部分,在 Gym 中控制机器人进行仿真。需要利用第三方的物理引擎如&nbsp;MuJoCo&nbsp;。</span></p></li></ul><p style="line-height: 1.75em; text-align: center;"></p><p style="line-height: 1.75em;"><br/></p><section style="margin-top: 10px;margin-bottom: 10px;text-align: center;font-size: 18px;box-sizing: border-box;"><section style="display: inline-block;border-radius: 6px;line-height: 1.6em;background-color: rgb(169, 218, 228);box-sizing: border-box;"><section style="border-radius: 6px 0px 0px 6px;display: inline-block;vertical-align: top;width: 1.6em;height: 1.6em;color: rgb(255, 255, 255);background-color: rgb(248, 205, 208);box-sizing: border-box;"><p style="box-sizing: border-box;">05</p></section><section style="padding-right: 10px;padding-left: 10px;display: inline-block;vertical-align: top;color: rgb(255, 255, 255);font-size: 15px;line-height: 2;letter-spacing: 2px;box-sizing: border-box;"><p style="box-sizing: border-box;"><strong style="box-sizing: border-box;">注册表</strong> </p></section></section></section><p style="line-height: 1.75em;"><span style="color: rgb(26, 26, 26); text-align: justify; text-indent: 30px; background-color: rgb(255, 255, 255); font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Gym 是一个包含各种各样强化学习仿真环境的大集合,并且封装成通用的接口暴露给用户,查看所有环境的代码如下:</span><br/></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">Gym 支持将用户制作的环境写入到注册表中,需要执行&nbsp;</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">gym.make()</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">&nbsp;和在启动时注册</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background: rgb(246, 246, 246); box-sizing: border-box !important; overflow-wrap: break-word !important;">register</span><span style="font-family: 微软雅黑, Microsoft YaHei; margin: 0px; padding: 0px; max-width: 100%; text-align: justify; text-indent: 30px; color: rgb(26, 26, 26); letter-spacing: 0pt; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; box-sizing: border-box !important; overflow-wrap: break-word !important;">,例如:</span></span></p><p style="text-align: center;"></p><p><br/></p><p><br/></p><section style="margin-top: 10px;margin-bottom: 10px;text-align: center;font-size: 18px;box-sizing: border-box;"><section style="display: inline-block;border-radius: 6px;line-height: 1.6em;background-color: rgb(169, 218, 228);box-sizing: border-box;"><section style="border-radius: 6px 0px 0px 6px;display: inline-block;vertical-align: top;width: 1.6em;height: 1.6em;color: rgb(255, 255, 255);background-color: rgb(248, 205, 208);box-sizing: border-box;"><p style="box-sizing: border-box;">06</p></section><section style="padding-right: 10px;padding-left: 10px;display: inline-block;vertical-align: top;color: rgb(255, 255, 255);font-size: 15px;line-height: 2;letter-spacing: 2px;box-sizing: border-box;"><p style="box-sizing: border-box;"><strong>结语</strong>&nbsp;</p></section></section></section><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei;">emmmm ... 第一篇强化学习入坑笔记写完,大多是从官方文档看过来的加上了一点点自己的理解,建议文档这东西还是直接看官方的吧,原汁原味。</span></p><p style="line-height: 1.75em;"><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei;"><br/></span></p><p style="line-height: 1.75em;"><strong><span style="font-size: 14px; font-family: 微软雅黑, Microsoft YaHei;">参考链接如下:</span></strong><span style="font-family: 仿宋_GB2312; font-size: 14px;">&nbsp;</span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">https://gym.openai.com/docs/</span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">https://nndl.github.io/</span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">https://github.com/openai/gym/blob/master/gym/spaces/discrete.py</span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">https://github.com/openai/gym/blob/master/gym/spaces/box.py</span></p><p style="line-height: 1.75em;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">https://gym.openai.com/envs/#classic_control</span></p><p><br/></p><link rel="stylesheet" href="//bbs.lejurobot.com/source/plugin/wcn_editor/public/wcn_editor_fit.css?v134_kKx" id="wcn_editor_css"/>
页: [1]
查看完整版本: OpenAI Gym 入门教程