草莓干 发表于 2020-12-25 17:24:57

OpenAI Gym 源码阅读:创建自定义强化学习环境

本帖最后由 草莓干 于 2020-12-25 17:24 编辑 <br /><br /><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">本文来源:乐聚机器人王松博士《OpenAI Gym 源码阅读:创建自定义强化学习环境》</span></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><br/></span></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><section class="layout" style="border:0;margin:2em auto 0; padding: 0.5em 0;white-space: normal;border: none;border-top: 1px solid #ccc;display: block; font-size: 1em; font-family: inherit; font-style: normal;font-weight: inherit; text-decoration: inherit; color: rgb(166, 166, 166);"><section style="margin-top: -1.2em;text-align: center;text-align: center; padding: 0; border: none; line-height: 1.4;"><span class="135brush" data-brushtype="text" style="background-color:#0F0F19; border-color:#B7B8B8; color:#FFFFFF; font-family:inherit; font-size:1em; font-style:normal; font-weight:inherit; padding:8px 23px; text-align:center; text-decoration:inherit"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Gym 介绍</span></span> </section></section><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><br/></span></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Gym(https://gym.openai.com/)是一套开发强化学习算法的工具箱,包含了一系列内置的环境(https://gym.openai.com/docs/#environments),结合强化学习算法就可以对内置的环境进行求解。</span><br/><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p style="line-height: 1.75em; margin-bottom: 15px; text-align: center;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">例如,调用 CartPole-v0 环境的示例如下:</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Gym 仿真主要包括:</span></p><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">导入环境 gym.make(CartPole-v0)</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">初始化环境 env.reset(),将强化学习环境设置为初始状态</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">一步仿真 env.step(action),输入动作,获得环境反馈</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">渲染可视化当前状态 env.render()</span></p></li></ul><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">虽然 Gym 内置了大量强化学习环境,如果想训练自定义的强化学习问题,就必须要创建自定义的强化学习环境。</span></p><p><br/></p><section class="layout" style="border:0;margin:2em auto 0; padding: 0.5em 0;white-space: normal;border: none;border-top: 1px solid #ccc;display: block; font-size: 1em; font-family: inherit; font-style: normal;font-weight: inherit; text-decoration: inherit; color: rgb(166, 166, 166);"><section style="margin-top: -1.2em;text-align: center;text-align: center; padding: 0; border: none; line-height: 1.4;"><span class="135brush" data-brushtype="text" style="background-color:#0F0F19; border-color:#B7B8B8; color:#FFFFFF; font-family:inherit; font-size:1em; font-style:normal; font-weight:inherit; padding:8px 23px; text-align:center; text-decoration:inherit"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">源码解析</span></span> </section></section><p style="line-height: 1.75em; margin-bottom: 15px;"><br/></p><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">根据上一节的 Gym 主要函数调用接口,CartPoleEnv(https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py)继承了基类 gym.Env(https://github.com/openai/gym/blob/master/gym/core.py),里面定义了主要的 API 方法:</span></p><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">step</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">reset</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">render</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">close</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 5px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">seed</span></p></li></ul><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">创建了自定义的环境,需要由 gym/envs/init.py(https://github.com/openai/gym/blob/master/gym/envs/__init__.py)进行注册,注册 id 名,指定路径gym.envs.classic_control:CartPoleEnv 和其他参数。</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">在 gym/envs/registration.py(https://github.com/openai/gym/blob/master/gym/envs/registration.py#L150)实例化了 1 个全局的 registry = EnvRegistry()</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">在 gym/envs/registration.py(https://github.com/openai/gym/blob/master/gym/envs/registration.py)中根据 entry_point 实例化环境 env</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">所以,总结一下,如果希望导入自定义环境的话,只需要在自定义的 package 中注册 id,并指定自定义 Env 类的路径</span></p><p style="text-align: center;"></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">然后调用 gym.make(custom-env-name) 就能导入自定义的环境</span></p><p><br/></p><p><br/></p><section class="layout" style="border:0;margin:2em auto 0; padding: 0.5em 0;white-space: normal;border: none;border-top: 1px solid #ccc;display: block; font-size: 1em; font-family: inherit; font-style: normal;font-weight: inherit; text-decoration: inherit; color: rgb(166, 166, 166);"><section style="margin-top: -1.2em;text-align: center;text-align: center; padding: 0; border: none; line-height: 1.4;"><span class="135brush" data-brushtype="text" style="background-color:#0F0F19; border-color:#B7B8B8; color:#FFFFFF; font-family:inherit; font-size:1em; font-style:normal; font-weight:inherit; padding:8px 23px; text-align:center; text-decoration:inherit"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">创建自定义环境</span></span> </section></section><p style="line-height: 1.75em; margin-bottom: 15px;"><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">根据上面注册环境的流程分析,可知,要引入自定义环境,不必改动 Gym 的源码,只需创建一个 Python 模块 即可。目录结构解释如下:</span></p><p style="line-height: 1.75em; margin-bottom: 15px; text-align: center;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><p><br/></p><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">为了方便调试调用,以 pip install -e . 安装自定义模块。测试代码中,引入模块时,即可将自定义环境注册到 Gym 环境中。</span></p><p style="text-align: center;"></p><p><br/></p><p><br/></p><section class="layout" style="border:0;margin:2em auto 0; padding: 0.5em 0;white-space: normal;border: none;border-top: 1px solid #ccc;display: block; font-size: 1em; font-family: inherit; font-style: normal;font-weight: inherit; text-decoration: inherit; color: rgb(166, 166, 166);"><section style="margin-top: -1.2em;text-align: center;text-align: center; padding: 0; border: none; line-height: 1.4;"><span class="135brush" data-brushtype="text" style="background-color:#0F0F19; border-color:#B7B8B8; color:#FFFFFF; font-family:inherit; font-size:1em; font-style:normal; font-weight:inherit; padding:8px 23px; text-align:center; text-decoration:inherit"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">自定义环境模块参考代码</span></span> </section></section><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">apoddar573/Tic-Tac-Toe-Gym_Environment(https://github.com/wangshub/Tic-Tac-Toe-Gym_Environment)</span></p></li><li><p><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">PyBullet Gymperium(https://github.com/benelot/pybullet-gym)</span><br/></p></li></ul><p><br/></p><p><br/></p><section class="layout" style="border:0;margin:2em auto 0; padding: 0.5em 0;white-space: normal;border: none;border-top: 1px solid #ccc;display: block; font-size: 1em; font-family: inherit; font-style: normal;font-weight: inherit; text-decoration: inherit; color: rgb(166, 166, 166);"><section style="margin-top: -1.2em;text-align: center;text-align: center; padding: 0; border: none; line-height: 1.4;"><span class="135brush" data-brushtype="text" style="background-color:#0F0F19; border-color:#B7B8B8; color:#FFFFFF; font-family:inherit; font-size:1em; font-style:normal; font-weight:inherit; padding:8px 23px; text-align:center; text-decoration:inherit"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"><span style="color: rgb(255, 255, 255); font-family: 微软雅黑, Microsoft YaHei; font-size: 14px; text-align: center; background-color: rgb(15, 15, 25);">参考</span></span></span></section></section><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;"></span></p><ul class=" list-paddingleft-2" style="list-style-type: disc;"><li><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Tic-Tac-Toe-Gym_Environment(https://github.com/apoddar573/Tic-Tac-Toe-Gym_Environment)</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">Create custom gym environments from scratch - A stock market example(https://towardsdatascience.com/creating-a-custom-openai-gym-environment-for-stock-trading-be532be3910e)</span></p></li><li><p style="line-height: 1.75em; margin-bottom: 15px;"><span style="font-family: 微软雅黑, Microsoft YaHei; font-size: 14px;">pybullet-gym(https://github.com/benelot/pybullet-gym)</span></p></li></ul><p><br/></p><link rel="stylesheet" href="//bbs.lejurobot.com/source/plugin/wcn_editor/public/wcn_editor_fit.css?v134_kKx" id="wcn_editor_css"/>
页: [1]
查看完整版本: OpenAI Gym 源码阅读:创建自定义强化学习环境