Possible to paralellize dependent forloop in cuda?
(What I mean is, it has to execute in the order, i==0, i==1, …)
for (int i=0; i<10; i++)
{
somecode…
}
If each CUDA thread executes each iteration, then the order of execution is undeterministic , right ?
Is there any technique or some sort of lock ?
Or, this can only be run in this manner ?
for (int i=0; i<10; i++)
{
cuda_parallelization_this_part(inside the loop)
}
Thanks…