Hi, I’ve been running into a weird problem and I was hoping someone can help me.
I’m not too fancy with words but here goes…
I’m calculating the indices for a cube so in the global void kernel, I calculate the index.x, index.y, index.z for a cubic volume, and their values.
For example, when index[0].x = 0, index[0].y = 0, and index[0].z = 0, then value[0] = 5 and etc.
the index[n] goes from 0 to say 64, so the cubic volume has 646464 elements.
The trouble is number of elements in int3 *index is different than float *value.
so a rough code would look something like this:
dim3 blocks(16,16,16);
dim3 threads(4,4,4);
kernel<<<blocks, threads>>>(index, values);
where:
global void kernel(int3 *index, float *value)
{
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;
int z = threadIdx.z + blockIdx.z * blockDim.z;
int n = x + y * blockDim.x * gridDim.x +
z * blockDim.x * gridDim.x * blockDim.y * gridDim.y;
index[n].x = some calculation for values between 0 and 63;
index[n].y = some calculation for values between 0 and 63;
index[n].z = some calculation for values between 0 and 63;
float cubic_value = some other calculation that has the same number of elements as n, and calculated from index[n];
int total_index = index[n].z * (64*64) + index[n].y * (64) + index[n].x;
value[total_index[n]] = cubic_value;
}
I can’t seem to get the code to work properly, it compiles and runs and gets most of values right, but every single time I compile and run with exactly the same input, aka, did not change a thing, the values come out differently. I know I’m doing something wrong, this is probably not the best way to code but I can’t seem to figure out a different way to do it. I can cudaMemcpy total_index, and cubic_value to host memory and create a loop in C++ to solve the problem.
For example:
for (int a = 0; a < N; a++)
{
value[ total_index[a] ] = cubic_value[a];
}
But is there a way to solve it in CUDA without having to copy the values to host memory? I have to do this in a fairly large loop, and N is > 1 million points.
Thanks a ton!!!