Skip to content

Releases: opendilab/DI-engine

v0.4.2

07 Sep 18:11
Compare
Choose a tag to compare

API Change

  1. config will be deepcopyed by default in compile_config function
  2. After calling compile_config function, current code repo git log and git diff information will be saved in exp_name directory

Env

  1. add rocket env (#449)
  2. updated pettingzoo env and improved related performance (#457)
  3. add mario env demo (#443)
  4. add MAPPO multi-agent config (#464)
  5. add mountain car (discrete action) environment (#452)
  6. fix multi-agent mujoco gym comaptibility bug
  7. fix gfootball env save_replay variable init bug

Algorithm

  1. add IBC (Implicit Behaviour Cloning) algorithm (#401)
  2. add BCO (Behaviour Cloning from Observation) algorithm (#270)
  3. add continuous PPOPG algorithm (#414)
  4. add PER in CollaQ (#472)
  5. add activation option in QMIX and CollaQ

Enhancement

  1. update ctx to dataclass (#467)

Fix

  1. base_env FinalMeta bug about gym 0.25.0-0.25.1
  2. config inplace modification bug
  3. ding cli no argument problem
  4. import errors after running setup.py (jinja2, markupsafe)
  5. conda py3.6 and cross platform build bug

Style

  1. add project state and datetime in log dir (#455)
  2. polish notes for q-learning model (#427)
  3. revision to mujoco dockerfile and validation (#474)
  4. add dockerfile for cityflow env
  5. polish default output log format

Contributors: @PaParaZz1 @ZHZisZZ @zjowowen @song2181 @zerlinwang @i-am-tc @hiha3456 @nighood @kxzxvbk @Weiyuhong-1998 @RobinC94

v0.4.1

14 Aug 09:14
Compare
Choose a tag to compare

API Change

  1. upgrade Python version from 3.6-3.8 to 3.7-3.9
  2. upgrade gym version from 0.20.0 to 0.25.0, plenty of env_id needs to update (e.g., Pendulum-v0 to Pendulum-v1) (#434)
  3. upgrade torch version from 1.10.0 to 1.12.0
  4. upgrade mujoco bin from 2.0.0 to 2.1.0
  5. add new task pipeline demo (DDPG/TD3/D4PG/C51/QRDQN/IQN?SQIL/TREX/PDQN) (#374, #380, #384, #407)

Env (dizoo)

  1. add gym anytrading env (#424)
  2. add board games env (tictactoe, gomuku, chess) (#356)
  3. add sokoban env (#397) (#429)
  4. add BC and DQN demo for gfootball (#418) (#423)
  5. add discrete pendulum env (#395)

Algorithm

  1. add STEVE model-based algorithm (#363)
  2. add PLR algorithm (#408)
  3. plugin ST-DIM into PPO (#379)

Enhancement

  1. add final result saving in training pipeline

Fix

  1. random policy randomness bug
  2. action_space seed compalbility bug
  3. discard message sent by self in redis mq (#354)
  4. remove pace controller (#400)
  5. import error in serial_pipeline_trex (#410)
  6. unittest hang and fail bug (#413)
  7. DREX collect data bug
  8. remove unused import cv2
  9. ding CLI env/policy option bug

Style

  1. add buffer api description (#371)
  2. polish VAE comments (#404)
  3. unittest for FQF (#412)
  4. add metaworld dockerfile (#432)
  5. remove opencv requirement in default setting
  6. update long description in setup.py

New Repo

  1. InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer
  2. awesome-decision-transformer: A curated list of Decision Transformer resources
  3. awesome-exploration-RL: A curated list of awesome exploration RL resources

Contributors: @PaParaZz1 @zjowowen @sailxjx @puyuan1996 @ZHZisZZ @lixl-st @Cloud-Pku @Weiyuhong-1998 @karroyan @kxzxvbk @song2181 @nighood @zhangpaipai @Hcnaeg

v0.4.0

21 Jun 12:37
Compare
Choose a tag to compare

API Change

  1. refactor DI-engine doc and update doc links doc | 中文文档
  2. refactor default logging lib and add DI-toolkit (ditk) requirement (just enter pip install DI-toolkit)

Env (dizoo)

  1. add MAPPO/MASAC all configs in SMAC (#310) (SOTA results in SMAC!!!)
  2. add dmc2gym env (#344) (#360)
  3. remove DI-star requirements of dizoo/smac, use official pysc2 (#302)
  4. add latest GAIL mujoco config (#298)
  5. polish procgen env (#311)
  6. add MBPO ant and humanoid config for mbpo (#314)
  7. fix slime volley env obs space bug when agent_vs_agent
  8. fix smac env obs space bug
  9. fix import path error in lunarlander (#362)

Algorithm

  1. add Decision Transformer algorithm (#327) (#364)
  2. add on-policy PPG algorithm (#312)
  3. add DDPPO & add model-based SAC with lambda-return algorithm (#332)
  4. add infoNCE loss and ST-DIM algorithm (#326)
  5. add FQF distributional RL algorithm (#274)
  6. add continuous BC algorithm (#318
  7. add pure policy gradient PPO algorithm (#382)
  8. add SQIL + SAC algorithm (#348)
  9. polish NGU and related modules (#283) (#343) (#353)
  10. add marl distributional td loss (#331)

Enhancement

  1. add new worker middleware (#236) (new DRL programming model and pipeline example)
  2. refactor model-based RL pipeline (ding/world_model) (#332)
  3. refactor logging system in the whole DI-engine (#316)
  4. add env supervisor design (#330)
  5. support async reset for envpool env manager (#250)
  6. add log videos to tensorboard (#320)
  7. refactor impala cnn encoder interface (#378)

Fix

  1. env save replay bug
  2. transformer mask inplace operation bug
  3. transtion_with_policy_data bug in SAC and PPG

Style

  1. add dockerfile for ding:hpc image (#337)
  2. fix mpire 2.3.5 which handles default processes more elegantly (#306)
  3. use FORMAT_DIR instead of ./ding (#309
  4. update quickstart colab link (#347)
  5. polish comments in ding/model/common (#315)
  6. update mujoco docker download path (#386)
  7. fix protobuf new version compatibility bug
  8. fix torch1.8.0 torch.div compatibility bug
  9. update doc links in readme
  10. add outline in readme and update wechat image
  11. update head image and refactor docker dir

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @ZHZisZZ @Will-Nie @zjowowen @HansBug @zerlinwang @Weiyuhong-1998 @davide97l @hiha3456 @LuciusMos @kxzxvbk @lixl-st @zhangpaipai @song2181 @karroyan

v0.3.1

23 Apr 08:19
Compare
Choose a tag to compare

API Change

  1. Substitute gym.wrappers.RecordVideo for gym.wrappers.Monitor to save video replay
  2. Substitute policy/bc.py for policy/il.py and update relevant serial_pipeline and unittest
  3. Polish all the configurations in dizoo with our new config guideline

Env (dizoo)

  1. polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)
  2. add GRF academic env and config (#281)
  3. update env inferface of GRF (#258)
  4. update D4RL offline RL env and config (#285)
  5. polish PomdpAtariEnv (#254)

Algorithm

  1. DREX Inverse RL algorithm (#218)

Feature

  1. separate mq and parallel modules, add redis (#247)
  2. rename env variables; fix attach_to parameter (#244)
  3. env implementation check (#275)
  4. adjust and set the max column number of tabulate in log (#296)
  5. speed up GTrXL forward method + GRU unittest (#253) (#292)
  6. add drop_extra option for sample collect

Fix

  1. add act_scale in DingEnvWrapper; fix envpool env manager (#245)
  2. auto_reset=False and env_ref bug in env manager (#248)
  3. data type and deepcopy bug in RND (#288)
  4. share_memory bug and multi_mujoco env (#279)
  5. some bugs in GTrXL (#276)
  6. update gym_vector_env_manager and add more unittest (#241)
  7. mdpolicy random collect bug (#293)
  8. gym.wrapper save video replay bug
  9. collect abnormal step format bug and add unittest

Test

  1. add buffer benchmark & socket test (#284)

Style

  1. upgrade mpire (#251)
  2. add GRF(google research football) docker (#256)
  3. update policy and gail comment

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @davide97l @hiha3456 @zjowowen @Weiyuhong-1998 @LuciusMos @kxzxvbk @lixl-st @YinminZhang @song2181 @Hcnaeg @norman26625 @jayyoung0802 @RobinC94 @HansBug

v0.3.0

24 Mar 08:59
Compare
Choose a tag to compare

API Change

  1. add new BaseEnv definition:
  2. modify the return value of eval method in InteractionSerialEvaluator class from Tuple[bool, float] to Tuple[bool, dict].
  3. move the default logger to rich logger, you can set env variable like export ENABLE_RICH_LOGGING=False to disable it.
  4. add train_iter and env_step argument in ding CLI.
    • you can use them like ding -m serial -c pendulum_sac_config.py -s 0 --train-iter 1e3
  5. remove default n_sample/n_episode value in policy default config.

Env (dizoo)

  1. add bitfilp HER DQN benchmark (#192) (#193) (#197)
  2. add slime volley league training demo (#229)

Algorithm

  1. Gated TransformXL (GTrXL) algorithm (#136)
  2. TD3 + VAE(HyAR) latent action algorithm (#152)
  3. stochastic dueling network (#234)
  4. use log prob instead of using prob in ACER (#186)

Feature

  1. support envpool env manager (#228)
  2. add league main and other improvements in new framework (#177) (#214)
  3. add pace controller middleware in new framework (#198)
  4. add auto recover option in new framework (#242)
  5. add k8s parser in new framework (#243)
  6. support async event handler and logger (#213)
  7. add grad norm calculator (#205)
  8. add gym vector env manager (#147)
  9. add train_iter and env_step in serial pipeline (#212)
  10. add rich logger handler (#219) (#223) (#232)
  11. add naive lr_scheduler demo

Refactor

  1. new BaseEnv and DingEnvWrapper (#171) (#231) (#240) Env English doc | 环境中文文档

Polish

Improve configurations in dizoo and add more algorithm benchmark doc example | 文档示例

  1. MAPPO and MASAC smac config (#209) (#239)
  2. QMIX smac config (#175)
  3. R2D2 atari config (#181)
  4. A2C atari config (#189)
  5. GAIL box2d and mujoco config (#188)
  6. ACER atari config (#180)
  7. SQIL atari config (#230)
  8. TREX atari/mujoco config
  9. IMPALA atari config
  10. MBPO/D4PG mujoco config

Fix

  1. random_collect compatible to episode collector (#190)
  2. remove default n_sample/n_episode value in policy config (#185)
  3. PDQN model bug on gpu device (#220)
  4. TREX algorithm CLI bug (#182)
  5. DQfD JE computation bug and move to AdamW optimizer (#191)
  6. pytest problem for parallel middleware (#211)
  7. mujoco numpy compatibility bug
  8. markupsafe 2.1.0 bug
  9. framework parallel module network emit bug
  10. mpire bug and disable algotest in py3.8
  11. lunarlander env import and env_id bug
  12. icm unittest repeat name bug
  13. buffer thruput close bug

Test

  1. resnet unittest (#199)
  2. SAC/SQN unittest (#207)
  3. CQL/R2D3/GAIL unittest (#201)
  4. NGU td unittest (#210)
  5. model wrapper unittest (#215)
  6. MAQAC model unittest (#226)

Style

  1. add doc docker (#221) (latex support)

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @davide97l @zjowowen @LuciusMos @kxzxvbk @Hcnaeg @jayyoung0802 @simonat2011 @jiaruonan

v0.2.3

04 Jan 06:43
Compare
Choose a tag to compare

API Change

  1. move actor_head_type to action_space (which is related DDPG/TD3/SAC)
  2. add multiple seeds in CLI: ding -m serial -c cartpole_dqn_config.py -s 0 -s 1 -s 2
  3. add new replay buffer (which separates algorithm and storage), user can refer to buffer
  4. add new main pipeline for async/parallel framework tutorial

Env (dizoo)

  1. add multi-agent mujoco env (#146)
  2. add delay reward mujoco env (#145)
  3. fix port conflict in gym_soccer (#139)

Algorithm

  1. MASAC algorithm (#112)
  2. TREX IRL algorithm (#119) (#144)
  3. H-PPO hybrid action space algorithm (#140)
  4. residual link in R2D2 (#150)
  5. gumbel softmax (#169)
  6. move actor_head_type to action_space field

Feature

  1. new main pipeline and async/parallel framework (#142) (#166) (#168)
  2. refactor buffer, separate algorithm and storage (#129)
  3. cli in new pipeline(ditask) (#160)
  4. add multiprocess tblogger, fix circular reference problem (#156)
  5. add multiple seed cli
  6. polish eps_greedy_multinomial_sample in model_wrapper (#154)

Fix

  1. R2D3 abs priority problem (#158) (#161)
  2. multi-discrete action space policies random action bug (#167)
  3. doc generate bug with enum_tools (#155)

Style

  1. more comments about R2D2 (#149)
  2. add doc about how to migrate a new env link
  3. add doc about env tutorial in dizoo link
  4. add conda auto release (#148)
  5. udpate zh doc link
  6. update kaggle tutorial link

New Repo

  1. awesome-model-based-RL: A curated list of awesome Model-Based RL resources
  2. DI-smartcross: Decision AI in Traffic Light Control

Contributors: @PaParaZz1 @sailxjx @puyuan1996 @Will-Nie @Weiyuhong-1998 @LikeJulia @RobinC94 @LuciusMos @mingzhang96 @shgqmrf15 @zjowowen

v0.2.2

03 Dec 14:23
Compare
Choose a tag to compare

Env (dizoo)

  1. apple key to door treasure env (#128)
  2. bsuite memory benchmark (#138)
  3. polish atari impala config

Algorithm

  1. Guided Cost IRL algorithm (#57)
  2. ICM exploration algorithm (#41)
  3. MP-DQN hybrid action space algorithm (#131)
  4. add loss statistics and polish r2d3 pong config (#126)

Enhancement

  1. add renew env mechanism in env manager and update timeout mechanism (#127) (#134)

Fix

  1. async subprocess env manager reset bug (#137)
  2. keepdims name bug in model wrapper
  3. on-policy ppo value norm bug
  4. GAE and RND unittest bug
  5. hidden state wrapper h tensor compatibility
  6. naive buffer auto config create bug

Style

  1. add supporters list

New Repo Feature

  1. treevalue speed benchmark

Contributors: @PaParaZz1 @puyuan1996 @RobinC94 @LikeJulia @Will-Nie @Weiyuhong-1998 @timothijoe @davide97l @lichuminglcm @YinminZhang

v0.2.1

22 Nov 08:15
Compare
Choose a tag to compare

API Change

  1. remove torch in all envs (numpy array is the basic data format in env)
  2. remove on_policy field in all the config
  3. change eval_freq from 50 to 1000

Tutorial and Doc

  1. env tutorial/环境指南

Env (dizoo)

  1. gym-hybrid env (#86)
  2. gym-soccer (HFO) env (#94)
  3. Go-Bigger env baseline (#95)
  4. sac and ppo config for bipedalwalker env(#121)

Algorithm

  1. DQfD Imitation Learning algorithm (#48) (#98)
  2. TD3BC offline RL algorithm (#88)
  3. MBPO model-based RL algorithm (#113)
  4. PADDPG hybrid action space algorithm (#109)
  5. PDQN hybrid action space algorithm (#118)
  6. fix R2D2 bugs and produce benchmark, add naive NGU (#40)
  7. self-play training demo in slime_volley env (#23)
  8. add example of GAIL entry + config for mujoco (#114)

Enhancement

  1. enable arbitrary policy num in serial sample collector
  2. add torch DataParallel for single machine multi-GPU
  3. add registry force_overwrite argument
  4. add naive buffer periodic thruput seconds argument

Fix

  1. target model wrapper hard reset bug
  2. fix learn state_dict target model bug
  3. ppo bugs and update atari ppo offpolicy config (#108)
  4. pyyaml version bug (#99)
  5. small fix on bsuite environment (#117)
  6. discrete cql unittest bug
  7. release workflow bug
  8. base policy model state_dict overlap bug
  9. remove on_policy option in dizoo config and entry
  10. remove torch in env

Test

  1. add pure docker setting test (#103)
  2. add unittest for dataset and evaluator (#107)
  3. add unittest for on-policy algorithm (#92)
  4. add unittest for ppo and td (MARL case) (#89)

Style

  1. gym version == 0.20.0
  2. torch version >= 1.1.0, <= 1.10.0
  3. ale-py == 0.7.0

New Repo

Contributors: @PaParaZz1 @puyuan1996 @Will-Nie @YinminZhang @Weiyuhong-1998 @LikeJulia @sailxjx @davide97l @jayyoung0802 @lichuminglcm @yifan123 @RobinC94 @zjowowen

v0.2.0

30 Sep 15:00
Compare
Choose a tag to compare

API Change

  1. SampleCollector rename to SampleSerialCollector
  2. EpisodeCollector rename to EpisodeSerialCollector
  3. BaseSerialEvaluator rename to InteractionSerialEvaluator
  4. ZerglingCollector rename to ZerglingParallelCollector
  5. OneVsOneCollector rename to MarineParallelCollector
  6. AdvancedBuffer registry name from priority to advanced

Env (dizoo)

  1. overcooked env (#20)
  2. procgen env (#26)
  3. modified predator env (#30)
  4. d4rl env (#37)
  5. imagenet dataset (#27)
  6. bsuite env (#58)
  7. move atari_py to ale-py

Algorithm

  1. SQIL algorithm (#25) (#44)
  2. CQL algorithm (discrete/continuous) (#37) (#68)
  3. MAPPO algorithm (#62)
  4. WQMIX algorithm (#24)
  5. D4PG algorithm (#76)
  6. update multi-discrete policy(dqn, ppo, rainbow) (#51) (#72)

Enhancement

  1. image classification supervised training pipeline (#27)
  2. add force_reproducibility option in subprocess env manager
  3. add/delete/restart replicas via cli for k8s
  4. add league metric (trueskill and elo) (#22)
  5. add tb in naive buffer and modify tb in advanced buffer (#39)
  6. add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)
  7. add hyper-parameter scheduler module (#38)
  8. add plot function (#59)

Fix

  1. acer weight bug and update atari result (#21)
  2. mappo nan bug and dict obs cannot unsqueeze bug (#54)
  3. r2d2 hidden state and obs pre-processing bug (#36) (#52)
  4. ppo bug when use dual_clip and adv > 0
  5. qmix double_q hidden state bug
  6. spawn context problem in interaction unittest (#69)
  7. formatted config no eval bug (#53)
  8. the catch statements that will never succeed and system proxy bug (#71) (#79)
  9. lunarlander config polish
  10. c51 head dimension mismatch bug
  11. mujoco config typo bug
  12. ppg atari config multi buffer bug
  13. max use and priority update special branch bug in advanced_buffer

Style

  1. add docker deploy in github workflow (#70) (#78) (#80)
  2. support PyTorch 1.9.0
  3. add algo/env list in README
  4. rename advanced_buffer register name to advanced

New Repo

Contributors: @PaParaZz1 @YinminZhang @Will-Nie @puyuan1996 @Weiyuhong-1998 @HansBug @sailxjx @simonat2011 @konnase @RobinC94 @LikeJulia @LuciusMos @jayyoung0802 @yifan123 @davide97l @garyzhang99

v0.1.1

02 Aug 17:48
Compare
Choose a tag to compare

API Change

  1. Indicate exp_name field in config to output logs and files

Env(dizoo)

  1. selfplay/league demo (#12)
  2. pybullet env (#16)
  3. minigrid env (#13)
  4. atari enduro config (#11)

Algorithm

  1. on policy PPO (#9)
  2. ACER algorithm (#14)

Enhancement

  1. polish experiment directory structure (#10)
  2. split doc to new repo (#4)

Fix

  1. atari env info action space bug
  2. env manager retry wrapper raise exception info bug
  3. dist entry disable-flask-log typo

Style

  1. codestyle optimization by lgtm (#7)
  2. code/comment statistics badge
  3. polish github CI workflow

Contributors: @PaParaZz1 @HansBug @YinminZhang @simonat2011