i am not very sure, from my opinion of view, the relative value between different reward_scales is important. And the value should be tuned according to experiences and simulation results
“reward_scale” is very important in maximizing rewords. If there are 2 or 3 scales, they can be determined by trial and error, but if there are 6 scales as described above, I think that some means will be required.