-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(davide): Implementation of D4PG #76
Conversation
Codecov Report
@@ Coverage Diff @@
## main #76 +/- ##
==========================================
+ Coverage 88.91% 89.06% +0.15%
==========================================
Files 357 359 +2
Lines 26579 26582 +3
==========================================
+ Hits 23632 23676 +44
+ Misses 2947 2906 -41
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
-entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR can be merged after solving these problems.
# Set up envs for collection and evaluation | ||
collector_env_num, evaluator_env_num = cfg.env.collector_env_num, cfg.env.evaluator_env_num | ||
collector_env = BaseEnvManager( | ||
env_fn=[lambda: PendulumEnv(cfg.env) for _ in range(collector_env_num)], cfg=cfg.env.manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether you know the difference between lambda expression and partial function when implements multiple env function, especially in the cases where each env has its own unique cfg
-added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST
7b1baa2
to
3f33b9d
Compare
* added experience replay and n-step * implementing distributional q value * added distributional q-value * added overview in qac_dist and d4pg * derived D4PG from DDPG * fixed a bug when action shape >1 * benchmark D4PG mujoco + minor fixs -entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional * formatted code * minor updates (read description) -added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST
* added experience replay and n-step * implementing distributional q value * added distributional q-value * added overview in qac_dist and d4pg * derived D4PG from DDPG * fixed a bug when action shape >1 * benchmark D4PG mujoco + minor fixs -entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional * formatted code * minor updates (read description) -added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST
Description
This PR is to implement the algorithm D4PG: https://arxiv.org/abs/1804.08617. It currently supports n-step, experience replay, and categorical distribution.