feature(pu): add td3_vae algorithm #152

puyuan1996 · 2021-12-15T11:22:45Z

Description

Add the td3_vae algorithm, train the vae encoder-decoder to encode the original continuous action into the latent action space, and test it in env lunarlander-cont.

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

codecov · 2021-12-15T11:40:08Z

Codecov Report

Merging #152 (11cac8f) into main (118cc67) will decrease coverage by 0.35%.
The diff coverage is 35.77%.

@@            Coverage Diff             @@
##             main     #152      +/-   ##
==========================================
- Coverage   84.73%   84.37%   -0.36%     
==========================================
  Files         430      436       +6     
  Lines       33019    33619     +600     
==========================================
+ Hits        27979    28367     +388     
- Misses       5040     5252     +212

Flag	Coverage Δ
unittests	`84.37% <35.77%> (-0.36%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
ding/policy/il.py	`77.77% <ø> (ø)`
ding/entry/serial_entry_td3_vae.py	`13.00% <13.00%> (ø)`
ding/policy/td3_vae.py	`13.27% <13.27%> (ø)`
ding/model/template/vae.py	`93.25% <93.25%> (ø)`
ding/entry/__init__.py	`100.00% <100.00%> (ø)`
ding/model/template/__init__.py	`100.00% <100.00%> (ø)`
ding/model/template/tests/test_vae.py	`100.00% <100.00%> (ø)`
ding/policy/__init__.py	`100.00% <100.00%> (ø)`
ding/policy/command_mode_policy_instance.py	`94.18% <100.00%> (+0.13%)`	⬆️
ding/utils/type_helper.py	`100.00% <100.00%> (ø)`
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 118cc67...11cac8f. Read the comment docs.

ding/model/template/vae.py

…presentation shift correction

…m current obs

… z in vae

…ddings of obs and action, use tanh after sample z and after the reconstruction_action head

…ot empty and will leave over the data in random collect phase

ding/model/template/vae.py

ding/worker/collector/sample_serial_collector.py

dizoo/box2d/lunarlander/config/lunarlander_cont_ddpg_config.py

dizoo/box2d/lunarlander/envs/lunarlander_env.py

dizoo/smac/config/smac_5m6m_masac_config.py

ding/model/template/vae.py

* feature(pu): add td3_vae * fix(pu): fix log typo * polish(pu): vae and rl update alternately * polish(pu):polish td3_vae config * test(pu): delete noise and change the data for updating vae * fix(pu): use latent action relabel * polish(pu):polish td3_vae config * feature(pu): add lunarlander continuous env(ci skip) * polish(pu):polish config * feature(pu): add ddpg lunarlander_cont config * feature(pu): add noise in original action in collect phase and add representation shift correction * fix(pu): using decode_with_obs to use the obs_encoding generating from current obs * feature(pu): representaion shift correction for each transition * feature(pu): add latent space constraint and tanh operation to sample z in vae * polish(pu): polish config * polish(pu): polish as review * polish(pu):polish td3_vae config * polish(pu): polish vae structure, use add not concat between the embeddings of obs and action, use tanh after sample z and after the reconstruction_action head * polish(pu):polish kl weight and prediction weight * polish(pu):polish td3_vae using the best setting * style(pu): yapf format * polish(pu):polish config * fix(pu): fix bug when collector_env_num>1, the self._traj_buffer is not empty and will leave over the data in random collect phase * polish(pu): update the current best config * polish(pu): polish config * polish(pu):polish as review * polish(pu):revert unwanted changes * polish(pu): polish vae obs encoding * polish(pu): return dict-type result and update comment format in vae * feature(pu): add save vae_model state_dict * polish(pu): add test_vae and yapf format Co-authored-by: niuyazhe <niuyazhe@sensetime.com>

* style(internlm): fix lint error * feat(utils/logger.py): support uniscale logger * fix(utils/logger.py): fix import circular error * feat(train.py): support dashboard metric panel and fix ci train config * fix(ci_scripts/train/slurm_train.sh): fix ci train error * fix(ci_scripts/train/torchrun.sh): fix ci train error * fix(ci_scripts/train): restore ci update * fix(config.json): delete alert webhook * feat(train.py): optimize func init logger * feat(config.json): delete config.json --------- Co-authored-by: 黄婷 <huangting3@CN0014010744M.local> Co-authored-by: huangting.p <huangting@sensetime.com>

puyuan1996 and others added 10 commits December 15, 2021 18:35

feature(pu): add td3_vae

39c7e2a

feature(pu): add td3_vae

f53806f

fix(pu): fix log typo

4d24068

polish(pu): vae and rl update alternately

4fca50f

polish(pu): vae and rl update alternately

58810c4

polish(pu):polish td3_vae config

5112584

test(pu): delete noise and change the data for updating vae

18f86f2

fix(pu): use latent action relabel

ae0ae93

polish(pu):polish td3_vae config

0e5fa1b

feature(pu): add lunarlander continuous env(ci skip)

5ad42c0

PaParaZz1 requested changes Dec 15, 2021

View reviewed changes

ding/model/template/vae.py Outdated Show resolved Hide resolved

ding/model/template/vae.py Outdated Show resolved Hide resolved

puyuan1996 added 18 commits December 16, 2021 20:03

polish(pu):polish config

3dbce39

feature(pu): add ddpg lunarlander_cont config

17c9f04

feature(pu): add noise in original action in collect phase and add re…

fdf2102

…presentation shift correction

fix(pu): using decode_with_obs to use the obs_encoding generating fro…

2cfc411

…m current obs

feature(pu): representaion shift correction for each transition

9212967

feature(pu): add latent space constraint and tanh operation to sample…

ec3a361

… z in vae

polish(pu): polish config

b93a380

polish(pu): polish as review

b65eb2d

polish(pu):polish td3_vae config

9e6de54

polish(pu): polish vae structure, use add not concat between the embe…

9dd84dd

…ddings of obs and action, use tanh after sample z and after the reconstruction_action head

polish(pu):polish kl weight and prediction weight

3f7e213

polish(pu):polish td3_vae using the best setting

c7d85c9

style(pu): yapf format

96ea362

polish(pu):polish config

6ca7764

fix(pu): fix bug when collector_env_num>1, the self._traj_buffer is n…

70328aa

…ot empty and will leave over the data in random collect phase

polish(pu): update the current best config

293e5c3

polish(pu): polish config

938fc92

polish(pu): polish config

18ca5a8

PaParaZz1 requested changes Dec 30, 2021

View reviewed changes

puyuan1996 added 2 commits December 30, 2021 15:07

polish(pu):polish as review

5d5eb37

polish(pu): polish as review

89b9c18

puyuan1996 changed the title ~~WIP: feature(pu): add td3_vae algorithm~~ feature(pu): add td3_vae algorithm Dec 30, 2021

puyuan1996 added 4 commits December 30, 2021 15:27

polish(pu):revert unwanted changes

13dd23e

polish(pu): polish vae obs encoding

eb53edb

polish(pu): return dict-type result and update comment format in vae

4d69b2b

feature(pu): add save vae_model state_dict

799606a

PaParaZz1 added this to the Environment Generalization milestone Dec 30, 2021

PaParaZz1 added the algo Add new algorithm or improve old one label Dec 30, 2021

PaParaZz1 force-pushed the main branch 2 times, most recently from ee876e0 to c6947cd Compare January 4, 2022 06:27

PaParaZz1 approved these changes Jan 4, 2022

View reviewed changes

polish(pu): add test_vae and yapf format

11cac8f

PaParaZz1 merged commit b21b598 into main Jan 5, 2022

PaParaZz1 deleted the dev-td3-vae branch January 5, 2022 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(pu): add td3_vae algorithm #152

feature(pu): add td3_vae algorithm #152

puyuan1996 commented Dec 15, 2021

codecov bot commented Dec 15, 2021 •

edited

Loading

feature(pu): add td3_vae algorithm #152

feature(pu): add td3_vae algorithm #152

Conversation

puyuan1996 commented Dec 15, 2021

Description

Related Issue

TODO

Check List

codecov bot commented Dec 15, 2021 • edited Loading

Codecov Report

codecov bot commented Dec 15, 2021 •

edited

Loading