Good Afternoon.
I have the following situation:
uint32_t const gid = blockIdx.x * blockDim.x + threadIdx.x;
const uint32_t iter_id = threadIdx.x % 16;
/*
* preparing
* for the main loop
*/
__synchthreads();
for (uint32_t h=0; h < 16; h++)
{
uint32_t mix[32];
fill_mix(gid);
for(uint32_t m = 0; m < 64; m++)
{
for (uint32_t i = 0; i<12; i++)
{
uint32_t rand = random_generator();
offset = mix[rand] % 4096;
data = my_piece_mem[offset];
mix[rand] = (mix[rand]*33)+data;
}
}
}
I would like to parallelize a cycle on the variable i to make my computations faster.
And I need 12 values of the variable data, calculated during the cycle on i for further computations.
How can I parallelize this cycle on i and how can I get 12 values of the variable data, that will be calculated parallel?