I wonder if i only have 500 elements
each elements need maybe 1000 of loops
it that gonna speed up if using gpu ?
To answer that must consider the specific nature of the program and the complexity involved in the inner loops. Is it one of the so called embarrassingly parallel tasks? Then you may indeed speed up the application. Even if so, what I would consider first is how often I needed to run the program. If I were to only run it once, I would not spend the time and effort to write it in cuda if I could much more easily write a sequential version. If the program is using all your CPU time, then yeah perhaps I would.
500 threads won’t saturate your GPU.
Now, if you could go through those loops in any order and could have half a million threads, each doing one iteration, then yes - this could work well.