I have my threads defined such that they are sequential over multiple blocks.
My problem is that only the first 512 threads enter a loop (for/while). The others seem to vanish.
THIS IS A HUGE PROBLEM FOR ME BECAUSE I NEED ALL THE THREADS TO ENTER A LOOP
I will show you what i mean. I have this defined for my threads to make them sequential:
/* Init threads */ int x = threadIdx.x; int y = threadIdx.y; int z = threadIdx.z; /* Init dimension of block (how many threads */ int bdx = blockDim.x; int bdy = blockDim.y; int bdz = blockDim.z; /* Init block indexs */ int bx = blockIdx.x; int by = blockIdx.y; /* init dimension of grid (how many blocks */ int gdx = gridDim.x; tid= by*gdx*(bdx*bdy*bdz) + bx*(bdx*bdy*bdz) + z*(bdx*bdy) + bdx*y +x;
Now this works perfectly! all threads have sequential numbering. I can prove that this works by inserting the following code into the kernel (im using emu mode to debug)
if(tid==0)printf("-------HERE ARE THE THREADS-----\n"); printf("|%i",tid); __syncthreads(); if(tid==0)printf("\n"); }
If the kernel was given the configuration <<<(2,2),512>>> such that we can expect 2048 threads to be created (22512=2048)
then the output is:
|0|1|2|3|4|5|6|7|8|9|10|11…all the way to…2042|2043|2044|2045
GREAT!! This is what it is supposed to do.
Now put that same code snippet into the insides of a loop (within the kernel). It seems as though only the first 512 threads enter the loop. The output is:
|0|1|2|3|4|5|6|7|8|9|10|11…all the way to…506|507|508|509|510|511
Can anyone help me. This is a HUGE problem for me