Recently I managed to train neural networks to balance a double pendulum (including the swing up) using a naive and very simple evolutionary algorithm. I now want to compare training speed and results with more modern and robust RL algorithms such as the ones provided with Isaac Lab on the exact same task. I modified the cartpole example to turn it into a double pendulum and modified the reward terms accordingly.
However after a few hours of training no solution was found (using rl_games and skrl with 8192 envs).
I then tried to use a easier setup by using a very low gravity and high rotational damping on the poles but it was also unsuccessful.
I am currently unsure if the problem comes from
the reward function
the setup
the task in itself being too challenging
or if I just need to increase the number of envs or training time