hi guys i have a big sequential for loop and the dependency part is there only in the bottom part of the code.
independent computation here
a[i]=a[i-1] op a[i]//dependency
before the dependency part all steps are independent.
Now my question is if: i want to implement this in GPU, then i would like all the independent steps to be executed parallely. Now the moment the first iteration is over i want to pass a[i-1] value to successive threads in successive iterations. I dont know if it is a good idea, and if it is then what is the best way to do this?