I tried out different Envs from the official OmniIsaacGymEnvs repo and was curious about the usage of SAC instead of the more commonly applied PPO there.
When comparing the Ant as well as Humanoid Envs with both standard configs for PPO and SAC, this results in PPO outperforming SAC by far. In absolute numbers, SAC barely converges against any result for some reasonable inference.
This happens even when increasing the maximum iterations and number of parallel envs for SAC.
Does this result from SAC generally perform this bad for these task, should the hyper params be adjusted or could there be some bugs within the rl_games implementation of SAC?