Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polish(nyz): polish dqn and ppo comments #732

Merged
merged 16 commits into from
Oct 31, 2023
Merged

Conversation

PaParaZz1
Copy link
Member

Description

Related Issue

TODO

Check List

  • merge the latest version source branch/repo, and resolve all the conflicts
  • pass style check
  • pass all the tests

@PaParaZz1 PaParaZz1 added the doc Documentation label Sep 20, 2023
@PaParaZz1 PaParaZz1 changed the title polish(nyz) polish dqn and ppo comments polish(nyz): polish dqn and ppo comments Oct 8, 2023
ding/policy/base_policy.py Show resolved Hide resolved
ding/policy/base_policy.py Outdated Show resolved Hide resolved
ding/policy/base_policy.py Show resolved Hide resolved
ding/policy/base_policy.py Outdated Show resolved Hide resolved
ding/policy/base_policy.py Show resolved Hide resolved
"""
# Data preprocessing operations, such as stack data, cpu to cuda device
data = default_preprocess_learn(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

数据预处理这里或许可以详细说明,里面做了哪些操作

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

里面的细节注释以后再更新

ding/policy/dqn.py Show resolved Hide resolved
R2D2 proposes that several tricks should be used to improve upon DRQN,
namely some recurrent experience replay tricks such as burn-in.
R2D2 proposes that several tricks should be used to improve upon DRQN, namely some recurrent experience replay \
tricks and the burn-in mechanism for off-policy training.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The R2D2 policy class is inspired by the paper "Recurrent Experience Replay in Distributed Reinforcement Learning". R2D2 suggests the incorporation of several enhancements over DRQN, specifically the application of novel recurrent experience replay strategies and the implementation of a burn-in mechanism for off-policy training.

ding/policy/r2d2.py Show resolved Hide resolved
ding/policy/r2d2.py Show resolved Hide resolved
ding/policy/r2d2.py Show resolved Hide resolved
ding/policy/r2d2.py Show resolved Hide resolved
ding/policy/r2d2.py Show resolved Hide resolved
ding/policy/r2d2.py Show resolved Hide resolved
ding/policy/r2d2.py Outdated Show resolved Hide resolved
ding/policy/dqn.py Show resolved Hide resolved
- data (:obj:`List[Dict[str, Any]`): The trajectory data(a list of transition), each element is the same \
format as the return value of ``self._process_transition`` method.
- transitions (:obj:`List[Dict[str, Any]`): The trajectory data (a list of transition), each element is \
the same format as the return value of ``self._process_transition`` method.
Returns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trajectory data, which is a list of transitions. Each element is in the same format as the return value of the self._process_transition method.

And the user can customize the this data processing procecure by overriding this two methods and collector \
itself.
- samples (:obj:`List[Dict[str, Any]]`): The processed train samples, each element is the similar format \
as input transitions, but may contain more data for training, such as nstep reward and target obs.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processed training samples. Each element is similar in format to the input transitions, but may contain additional data for training, such as n-step reward and target observations.

@PaParaZz1 PaParaZz1 merged commit 111bf24 into main Oct 31, 2023
31 of 40 checks passed
@PaParaZz1 PaParaZz1 deleted the dev-policy-comments branch October 31, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants