草莓干 发表于 2020-12-30 20:17:29

Pybulet Gym 源码解析:双足机器人模型 HumanoidPyBulletEnv-v0

<p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">本文来源于乐聚机器人王松博士:《Pybulet Gym 源码解析:双足机器人模型 HumanoidPyBulletEnv-v0》</span></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">OpenAI gym 是当前使用最为广泛的用于研究强化学习的工具箱,但 Gym 的物理仿真环境使用的是 Mujoco,不开源且收费,这一点一直被人诟病。而 Pybullet-gym 是对 Openai Gym Mujoco 环境的开源实现,用于替代 Mujoco 做为强化学习的仿真环境。封装了 Pybullet 的接口,无缝的接入了 Gym 环境。</span></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">关于如何创建 Gym 自定义环境可以参考上一期极客专栏《OpenAI Gym 源码阅读:创建自定义强化学习环境》</span></p><p><br/></p><section style="border-bottom: 1px solid #ddd;margin: 0 auto 10px;"><p class="135brush" data-brushtype="text" style="padding: 0px 5px 6px; border-bottom-width: 2px; border-bottom-style: solid; border-bottom-color: rgb(239, 112, 96); display: inline-block; margin: 0px 0px -1px; font-weight: normal; line-height: 1.1; font-size: 18px;"><strong><span style="font-family: 微软雅黑, Microsoft YaHei;">示例代码</span></strong></p></section><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">完整使用 HumanoidPyBulletEnv-v0 模型的示例代码,在 pybulletgym/examples/</span><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">(https://github.com/benelot/pybullet-gym/tree/master/pybulletgym/examples)路径下可以找到。</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">当示例代码引入 Pybullet-gym 库时,就完成了对 Pybullet 自定义 Gym 环境的注册。</span><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">根据 OpenAI Gym 的文档,下面是使用随机策略,调用 HumanoidPyBulletEnv-v0 的测试代码。</span></p><p style="text-align: center; line-height: normal; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p style="text-align: center; line-height: normal; margin-bottom: 5px;"></p><p><br/></p><p><br/></p><p><br/></p><section style="border-bottom: 1px solid #ddd;margin: 0 auto 10px;"><p class="135brush" data-brushtype="text" style="padding: 0px 5px 6px; border-bottom-width: 2px; border-bottom-style: solid; border-bottom-color: rgb(239, 112, 96); display: inline-block; margin: 0px 0px -1px; font-weight: normal; line-height: 1.1; font-size: 18px;"><strong><span style="font-family: 微软雅黑, Microsoft YaHei;">动作与观测</span></strong></p></section><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">首先查看 HumanoidPyBulletEnv-v0 运动空间和观测空间的维度大小</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">可知 HumanoidPyBulletEnv-v0 运动空间维度为 17,动作空间维度为 44。</span><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">查看注册环境源码,可知 HumanoidPyBulletEnv-v0 入口类为 HumanoidBulletEnv</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">根据 HumanoidBulletEnv 初始化 __init__ 的参数,可知机器人实例由Humanoid() 构建,顺藤摸瓜,获得 HumanoidBulletEnv 运动空间维度的详细定义</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">执行 step 时,pybullet-gym 中使用力矩对机器人电机进行控制</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">接下来查看观测空间定义,状态观测是由 observation, reward, done, info = env.step(action) 获得,因此查看源码 walker_base_env.py 可知状态计算方式</span></p><p style="line-height: 1.75em; margin-bottom: 15px; text-align: center;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">大致包含机器人的离地面高度、机器人欧拉角、各个关节相对位置、足底是否触地等状态。状态维度为一共为 46,详细的定义,以及为什么这么定义的原因未知,参见这条 issue 的讨论 openai/gym/issues/585(https://github.com/openai/gym/issues/585),看来 OpenAI 被戏称为 CloseAI 是有原因的。</span></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">done 根据机器人的高度和偏航角来判断机器人是否摔倒,回合是否结束</span></p><p style="text-align: center;"></p><p><br/></p><p><br/></p><section style="border-bottom: 1px solid #ddd;margin: 0 auto 10px;"><p class="135brush" data-brushtype="text" style="padding: 0px 5px 6px; border-bottom-width: 2px; border-bottom-style: solid; border-bottom-color: rgb(239, 112, 96); display: inline-block; margin: 0px 0px -1px; font-weight: normal; line-height: 1.1; font-size: 18px;"><strong><span style="font-family: 微软雅黑, Microsoft YaHei;">奖励函数</span></strong></p></section><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">对于强化学习问题,最为重要的就是奖励函数的设计,直接关乎训练后 Agent 的行为是否符合预期。HumanoidPyBulletEnv-v0 的奖励由下面几部分构成</span><br/></p><p style="text-align: center;"></p><p><br/></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><ul class=" list-paddingleft-2" style="padding: 0px 0px 0px 2.2em; max-width: 100%; color: rgb(51, 51, 51); font-family: -apple-system-font, BlinkMacSystemFont, Helvetica Neue, PingFang SC, Hiragino Sans GB, Microsoft YaHei UI, Microsoft YaHei, Arial, sans-serif; letter-spacing: 0.544px; text-align: justify; white-space: normal; background-color: rgb(255, 255, 255); box-sizing: border-box !important; overflow-wrap: break-word !important;"><li><p style="margin-top: 0px; margin-bottom: 0px; padding: 0px; max-width: 100%; clear: both; min-height: 1em; box-sizing: border-box !important; overflow-wrap: break-word !important;"><code style="margin: 0px 4px; padding: 3px 5px; max-width: 100%; box-sizing: border-box; overflow-wrap: break-word; font-family: Consolas, Monaco, courier, monospace; background: rgb(245, 245, 245); color: rgb(255, 64, 129); line-height: 1; font-size: 12px; border-width: 1px; border-style: solid; border-color: rgb(238, 238, 238); border-radius: 2px;">alive</code>:&nbsp;<span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">判断机器人是否摔倒</span></p></li><li><p style="margin-top: 0px; margin-bottom: 0px; padding: 0px; max-width: 100%; clear: both; min-height: 1em; box-sizing: border-box !important; overflow-wrap: break-word !important;"><code style="margin: 0px 4px; padding: 3px 5px; max-width: 100%; box-sizing: border-box; overflow-wrap: break-word; font-family: Consolas, Monaco, courier, monospace; background: rgb(245, 245, 245); color: rgb(255, 64, 129); line-height: 1; font-size: 12px; border-width: 1px; border-style: solid; border-color: rgb(238, 238, 238); border-radius: 2px;">progress</code>:&nbsp;<span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">速度</span><span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">的</span><span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">差值</span></p></li><li><p style="margin-top: 0px; margin-bottom: 0px; padding: 0px; max-width: 100%; clear: both; min-height: 1em; box-sizing: border-box !important; overflow-wrap: break-word !important;"><code style="margin: 0px 4px; padding: 3px 5px; max-width: 100%; box-sizing: border-box; overflow-wrap: break-word; font-family: Consolas, Monaco, courier, monospace; background: rgb(245, 245, 245); color: rgb(255, 64, 129); line-height: 1; font-size: 12px; border-width: 1px; border-style: solid; border-color: rgb(238, 238, 238); border-radius: 2px;">electricity_cost</code>:&nbsp;<span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">控</span><span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">制的能量损耗,</span><span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">由扭矩和电机速度计算</span></p></li><li><p style="margin-top: 0px; margin-bottom: 0px; padding: 0px; max-width: 100%; clear: both; min-height: 1em; box-sizing: border-box !important; overflow-wrap: break-word !important;"><code style="margin: 0px 4px; padding: 3px 5px; max-width: 100%; box-sizing: border-box; overflow-wrap: break-word; font-family: Consolas, Monaco, courier, monospace; background: rgb(245, 245, 245); color: rgb(255, 64, 129); line-height: 1; font-size: 12px; border-width: 1px; border-style: solid; border-color: rgb(238, 238, 238); border-radius: 2px;">joints_at_limit_cost</code>:&nbsp;<span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">关节是否卡住</span></p></li><li><p style="margin-top: 0px; margin-bottom: 0px; padding: 0px; max-width: 100%; clear: both; min-height: 1em; box-sizing: border-box !important; overflow-wrap: break-word !important;"><code style="margin: 0px 4px; padding: 3px 5px; max-width: 100%; box-sizing: border-box; overflow-wrap: break-word; font-family: Consolas, Monaco, courier, monospace; background: rgb(245, 245, 245); color: rgb(255, 64, 129); line-height: 1; font-size: 12px; border-width: 1px; border-style: solid; border-color: rgb(238, 238, 238); border-radius: 2px;">feet_collision_cost</code>:<span style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; overflow-wrap: break-word !important; font-size: 15px;">足底碰撞检测</span></p></li></ul><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Pybullet-Gym 代码逻辑是很清晰,但是由于是移植的 roboschool 的 Humanoid 环境,很多 Agent 代码细节没有文档可以参考,官方的态度是我们只需要关心采用哪些强化学习算法去训练 Agent 就可以了,不需要关注 Agent 的实现细节。</span><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">但是如果要训练自定义的 Biped Robot Walk 的话就必须深入看 Gym 的底层代码实现,研究状态、运动、以及奖励函数的具体细节。</span></p><p><br/></p><section style="border-bottom: 1px solid #ddd;margin: 0 auto 10px;"><p class="135brush" data-brushtype="text" style="padding: 0px 5px 6px; border-bottom-width: 2px; border-bottom-style: solid; border-bottom-color: rgb(239, 112, 96); display: inline-block; margin: 0px 0px -1px; font-weight: normal; line-height: 1.1; font-size: 18px;"><strong><span style="font-family: 微软雅黑, Microsoft YaHei;">补充</span></strong></p></section><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Humanoid V1 Wiki 介绍(https://github.com/openai/gym/wiki/Humanoid-V1#observation)</span><br/></p><p><br/></p><section style="border-bottom: 1px solid #ddd;margin: 0 auto 10px;"><p class="135brush" data-brushtype="text" style="padding: 0px 5px 6px; border-bottom-width: 2px; border-bottom-style: solid; border-bottom-color: rgb(239, 112, 96); display: inline-block; margin: 0px 0px -1px; font-weight: normal; line-height: 1.1; font-size: 18px;"><strong><span style="font-family: 微软雅黑, Microsoft YaHei;">参考链接</span></strong></p></section><p style="margin-bottom: 5px; line-height: normal;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">https://github.com/openai/gym</span><br/></p><p style="margin-bottom: 5px; line-height: normal;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">https://github.com/benelot/pybullet-gym</span></p><p style="margin-bottom: 5px; line-height: normal;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">OpenAI Gym 源码阅读:创建自定义强化学习环境</span></p><p style="margin-bottom: 5px; line-height: normal;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">欧拉中的俯仰、横滚、偏航(https://blog.csdn.net/guyubit/article/details/52995676)</span></p><p><br/></p><p><br/></p><link rel="stylesheet" href="//bbs.lejurobot.com/source/plugin/wcn_editor/public/wcn_editor_fit.css?v134_kKx" id="wcn_editor_css"/>
页: [1]
查看完整版本: Pybulet Gym 源码解析:双足机器人模型 HumanoidPyBulletEnv-v0