Tell me how to find the value of “reward_scale”.
For example, dist_reward_scale, rot_reward_scale.
rewards = dist_reward_scale * dist_reward + rot_reward_scale * rot_reward \
+ around_handle_reward_scale * around_handle_reward + open_reward_scale * open_reward \
+ finger_dist_reward_scale * finger_dist_reward - \
action_penalty_scale * action_penalty
How to calculate the value of “reward_scale”.
i am not very sure, from my opinion of view, the relative value between different reward_scales is important. And the value should be tuned according to experiences and simulation results
“reward_scale” is very important in maximizing rewords. If there are 2 or 3 scales, they can be determined by trial and error, but if there are 6 scales as described above, I think that some means will be required.
It is generally decided by trial and error. For the Franka task, we did quite a bit of parameter tuning with the various reward scales.