feature(davide): Implementation of D4PG #76

davide97l · 2021-09-27T07:46:08Z

Description

This PR is to implement the algorithm D4PG: https://arxiv.org/abs/1804.08617. It currently supports n-step, experience replay, and categorical distribution.

…o dev-d4pg

codecov · 2021-09-27T08:03:01Z

Codecov Report

Merging #76 (bad2cac) into main (206186f) will increase coverage by 0.15%.
The diff coverage is 96.04%.

❗ Current head bad2cac differs from pull request most recent head 797c368. Consider uploading reports for the commit 797c368 to get more accurate results

@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
+ Coverage   88.91%   89.06%   +0.15%     
==========================================
  Files         357      359       +2     
  Lines       26579    26582       +3     
==========================================
+ Hits        23632    23676      +44     
+ Misses       2947     2906      -41

Flag	Coverage Δ
unittests	`89.06% <96.04%> (+0.15%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
ding/entry/tests/test_serial_entry.py	`79.31% <77.77%> (+0.02%)`	⬆️
ding/model/template/tests/test_qac_dist.py	`95.00% <95.00%> (ø)`
ding/policy/d4pg.py	`96.38% <96.38%> (ø)`
ding/model/template/__init__.py	`100.00% <100.00%> (ø)`
ding/model/template/qac_dist.py	`100.00% <100.00%> (ø)`
ding/policy/__init__.py	`100.00% <100.00%> (ø)`
ding/policy/command_mode_policy_instance.py	`96.72% <100.00%> (+0.02%)`	⬆️
ding/utils/data/dataset.py	`55.35% <0.00%> (-13.80%)`	⬇️
ding/rl_utils/td.py	`88.88% <0.00%> (-0.36%)`	⬇️
ding/entry/application_entry.py	`85.48% <0.00%> (-0.24%)`	⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 206186f...797c368. Read the comment docs.

-entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional

davide97l · 2021-09-29T08:19:41Z

Training reward on Pendulum

PaParaZz1

This PR can be merged after solving these problems.

ding/model/template/qac_dist.py

PaParaZz1 · 2021-09-30T06:04:32Z

dizoo/classic_control/pendulum/entry/pendulum_d4pg_main.py

+    # Set up envs for collection and evaluation
+    collector_env_num, evaluator_env_num = cfg.env.collector_env_num, cfg.env.evaluator_env_num
+    collector_env = BaseEnvManager(
+        env_fn=[lambda: PendulumEnv(cfg.env) for _ in range(collector_env_num)], cfg=cfg.env.manager


Whether you know the difference between lambda expression and partial function when implements multiple env function, especially in the cases where each env has its own unique cfg

dizoo/classic_control/pendulum/entry/pendulum_d4pg_main.py

ding/policy/d4pg.py

-added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST

…o dev-d4pg

* added experience replay and n-step * implementing distributional q value * added distributional q-value * added overview in qac_dist and d4pg * derived D4PG from DDPG * fixed a bug when action shape >1 * benchmark D4PG mujoco + minor fixs -entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional * formatted code * minor updates (read description) -added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST

davide97l and others added 7 commits September 18, 2021 10:53

added experience replay and n-step

05a56a5

implementing distributional q value

c969cb4

added distributional q-value

e464793

added overview in qac_dist and d4pg

d3d8880

Merge branch 'opendilab:main' into dev-d4pg

b8d18cb

derived D4PG from DDPG

c9792f1

Merge branch 'dev-d4pg' of https://github.com/davide97l/DI-engine int…

28d21b6

…o dev-d4pg

PaParaZz1 added algo Add new algorithm or improve old one serial Serial training related labels Sep 27, 2021

PaParaZz1 changed the title ~~Implementation of D4PG~~ WIP: feature(davide): Implementation of D4PG Sep 27, 2021

davide97l added 4 commits September 29, 2021 09:53

fixed a bug when action shape >1

19a4873

benchmark D4PG mujoco + minor fixs

0e4c277

-entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional

Merge branch 'opendilab:main' into dev-d4pg

e675045

formatted code

59149b8

Merge branch 'opendilab:main' into dev-d4pg

179ba47

PaParaZz1 approved these changes Sep 30, 2021

View reviewed changes

PaParaZz1 changed the title ~~WIP: feature(davide): Implementation of D4PG~~ feature(davide): Implementation of D4PG Sep 30, 2021

davide97l added 3 commits September 30, 2021 16:51

minor updates (read description)

a2e62f2

-added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST

Merge branch 'opendilab:main' into dev-d4pg

9c34bf9

Merge branch 'dev-d4pg' of https://github.com/davide97l/DI-engine int…

bad2cac

…o dev-d4pg

PaParaZz1 force-pushed the main branch 2 times, most recently from 7b1baa2 to 3f33b9d Compare September 30, 2021 09:40

Merge branch 'main' into dev-d4pg

797c368

PaParaZz1 merged commit 16a89c3 into opendilab:main Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(davide): Implementation of D4PG #76

feature(davide): Implementation of D4PG #76

davide97l commented Sep 27, 2021

codecov bot commented Sep 27, 2021 •

edited

Loading

davide97l commented Sep 29, 2021

PaParaZz1 left a comment

PaParaZz1 Sep 30, 2021

feature(davide): Implementation of D4PG #76

feature(davide): Implementation of D4PG #76

Conversation

davide97l commented Sep 27, 2021

Description

codecov bot commented Sep 27, 2021 • edited Loading

Codecov Report

davide97l commented Sep 29, 2021

PaParaZz1 left a comment

Choose a reason for hiding this comment

PaParaZz1 Sep 30, 2021

Choose a reason for hiding this comment

codecov bot commented Sep 27, 2021 •

edited

Loading