Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(davide): Implementation of D4PG #76

Merged
merged 16 commits into from
Sep 30, 2021
Merged

Conversation

davide97l
Copy link
Collaborator

Description

This PR is to implement the algorithm D4PG: https://arxiv.org/abs/1804.08617. It currently supports n-step, experience replay, and categorical distribution.

@codecov
Copy link

codecov bot commented Sep 27, 2021

Codecov Report

Merging #76 (bad2cac) into main (206186f) will increase coverage by 0.15%.
The diff coverage is 96.04%.

❗ Current head bad2cac differs from pull request most recent head 797c368. Consider uploading reports for the commit 797c368 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main      #76      +/-   ##
==========================================
+ Coverage   88.91%   89.06%   +0.15%     
==========================================
  Files         357      359       +2     
  Lines       26579    26582       +3     
==========================================
+ Hits        23632    23676      +44     
+ Misses       2947     2906      -41     
Flag Coverage Δ
unittests 89.06% <96.04%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
ding/entry/tests/test_serial_entry.py 79.31% <77.77%> (+0.02%) ⬆️
ding/model/template/tests/test_qac_dist.py 95.00% <95.00%> (ø)
ding/policy/d4pg.py 96.38% <96.38%> (ø)
ding/model/template/__init__.py 100.00% <100.00%> (ø)
ding/model/template/qac_dist.py 100.00% <100.00%> (ø)
ding/policy/__init__.py 100.00% <100.00%> (ø)
ding/policy/command_mode_policy_instance.py 96.72% <100.00%> (+0.02%) ⬆️
ding/utils/data/dataset.py 55.35% <0.00%> (-13.80%) ⬇️
ding/rl_utils/td.py 88.88% <0.00%> (-0.36%) ⬇️
ding/entry/application_entry.py 85.48% <0.00%> (-0.24%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 206186f...797c368. Read the comment docs.

@PaParaZz1 PaParaZz1 added algo Add new algorithm or improve old one serial Serial training related labels Sep 27, 2021
@PaParaZz1 PaParaZz1 changed the title Implementation of D4PG WIP: feature(davide): Implementation of D4PG Sep 27, 2021
-entry for DDPG mujoco
-entry for D4PG mujoco
-config for D4PG mujoco
-fixed style D4PG code
-unittests for QAC distributional
@davide97l
Copy link
Collaborator Author

Training reward on Pendulum
image

Copy link
Member

@PaParaZz1 PaParaZz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR can be merged after solving these problems.

ding/model/template/qac_dist.py Outdated Show resolved Hide resolved
ding/model/template/qac_dist.py Outdated Show resolved Hide resolved
# Set up envs for collection and evaluation
collector_env_num, evaluator_env_num = cfg.env.collector_env_num, cfg.env.evaluator_env_num
collector_env = BaseEnvManager(
env_fn=[lambda: PendulumEnv(cfg.env) for _ in range(collector_env_num)], cfg=cfg.env.manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether you know the difference between lambda expression and partial function when implements multiple env function, especially in the cases where each env has its own unique cfg

ding/policy/d4pg.py Outdated Show resolved Hide resolved
ding/policy/d4pg.py Show resolved Hide resolved
@PaParaZz1 PaParaZz1 changed the title WIP: feature(davide): Implementation of D4PG feature(davide): Implementation of D4PG Sep 30, 2021
-added d4pg seria_entry test
-updated comments in QACDIST
-added d4pg in commander register
-added q_value in d4pg return dict
-added priority update in d4pg entry
-added assertion in QACDIST
@PaParaZz1 PaParaZz1 merged commit 16a89c3 into opendilab:main Sep 30, 2021
puyuan1996 pushed a commit to puyuan1996/DI-engine that referenced this pull request Dec 14, 2021
* added experience replay and n-step

* implementing distributional q value

* added distributional q-value

* added overview in qac_dist and d4pg

* derived D4PG from DDPG

* fixed a bug when action shape >1

* benchmark D4PG mujoco + minor fixs

-entry for DDPG mujoco
-entry for D4PG mujoco
-config for D4PG mujoco
-fixed style D4PG code
-unittests for QAC distributional

* formatted code

* minor updates (read description)

-added d4pg seria_entry test
-updated comments in QACDIST
-added d4pg in commander register
-added q_value in d4pg return dict
-added priority update in d4pg entry
-added assertion in QACDIST
puyuan1996 pushed a commit to puyuan1996/DI-engine that referenced this pull request Apr 18, 2022
* added experience replay and n-step

* implementing distributional q value

* added distributional q-value

* added overview in qac_dist and d4pg

* derived D4PG from DDPG

* fixed a bug when action shape >1

* benchmark D4PG mujoco + minor fixs

-entry for DDPG mujoco
-entry for D4PG mujoco
-config for D4PG mujoco
-fixed style D4PG code
-unittests for QAC distributional

* formatted code

* minor updates (read description)

-added d4pg seria_entry test
-updated comments in QACDIST
-added d4pg in commander register
-added q_value in d4pg return dict
-added priority update in d4pg entry
-added assertion in QACDIST
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
algo Add new algorithm or improve old one serial Serial training related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants