Hello,
I have some C++ code that I want to CUDAize, it looks like:
for(*ihour = 0; *ihour < 24; *ihour++) {
for(*i = 0; *i < 15000; *i++) {
....
code omitted
...
}
}
and I want to parallelize it (of course as efficiently as possible).
I am not familiar with the blocks and grids, but this is how I have set it up so far:
*ihour = blockIdx.x * blockDim.x + threadIdx.x;
*i = blockIdx.x * blockDim.x + threadIdx.x;
and my kernel:
dim3 dimGrid(1500,1) dimBlock(10,24);
prog<<<dimGrid,dimBlock>>>
is this correct at all?
thanks CUDA people