How to parallelize a cycle

Good Afternoon.
I have the following situation:

uint32_t const gid = blockIdx.x * blockDim.x + threadIdx.x;
const uint32_t iter_id = threadIdx.x % 16;
* preparing
* for the main loop
for (uint32_t h=0; h < 16; h++)
   uint32_t mix[32];
   for(uint32_t m = 0; m < 64; m++)
      for (uint32_t i = 0; i<12; i++)
        uint32_t rand = random_generator();
        offset = mix[rand] % 4096;
        data = my_piece_mem[offset];
        mix[rand] = (mix[rand]*33)+data; 

I would like to parallelize a cycle on the variable i to make my computations faster.
And I need 12 values of the variable data, calculated during the cycle on i for further computations.

How can I parallelize this cycle on i and how can I get 12 values of the variable data, that will be calculated parallel?

This inner loop does not appear to be parallelizable in a thread parallel way, as you’re modifying mix at random locations as you iterate. Hence there is a data dependency.

OK. But will this inner loop be appear to be parallelizable without nodifying mix at random locations?
If it will be, then how can I parallelize in such case?