Couldn't the ID of thread be assigned to a variable?

I write the code like this

__global__
void initBlob(const unsigned char *in, unsigned char *out, size_t *indexTab, int total2D)
{
	size_t i = blockIdx.x * blockDim.x + threadIdx.x;

	if(i < total2D)
	{
		out[i] = in[i];
		if(255 == in[i])
		{
			indexTab[i] = i;
			printf("index[%d] = %d ",i, indexTab[i]);
		}
	}
}

but the value of indexTab[i] always be an zero. what’ wrong here, please help me.

what happens if you do:

printf("index[%ld] = %ld ",i, indexTab[i]);

instead of:

printf("index[%d] = %d ",i, indexTab[i]);

size_t is a 64-bit type on 64-bit platforms.

Thanks for your answer, but the result is same after I test according to your suggestion,and normally,the value of each elememt of indexTab should be different. but I write the code like this

size_t numV = std::count(index_tab.begin(), index_tab.end(), 255);

the value of numV is 1875, It’s so weird.The value of indexTab[i] in local watch window is right, but it is wrong after I copy to host’s variable,and the value which is printed on the screen is also incorrect.

I guess you’ll need to provide a code that somebody else could compile and run to see the issue then. I built a simple code around what you have shown, and the issue was fixed when I made that change to printf.

could you post your simple code for me, I am the only person here who learn the cuda,only my machine can compile it ,thanks a lot.

Here is a worked example. If I compile with -DBROKEN (which uses your original code) the output is all zero. If I don’t compile with -DBROKEN (therefore using the fix) then the output is 0,1,2,3 i.e. the thread ID:

$ cat t916.cu
#include <stdio.h>
#define DS 4
__global__
void initBlob(const unsigned char *in, unsigned char *out, size_t *indexTab, int total2D)
{
        size_t i = blockIdx.x * blockDim.x + threadIdx.x;

        if(i < total2D)
        {
                out[i] = in[i];
                if(255 == in[i])
                {
                        indexTab[i] = i;
#ifdef BROKEN
                        printf("index[%d] = %d ",i, indexTab[i]);
#else
                        printf("index[%ld] = %ld ",i, indexTab[i]);
#endif
                }
        }
}

int main(){

  unsigned char *d_in, *d_out;
  size_t *d_it;
  cudaMalloc(&d_in, DS*sizeof(unsigned char));
  cudaMalloc(&d_out, DS*sizeof(unsigned char));
  cudaMalloc(&d_it, DS*sizeof(size_t));
  cudaMemset(d_in, 255, DS*sizeof(unsigned char));
  initBlob<<<1,DS>>>(d_in, d_out, d_it, DS);
  cudaDeviceSynchronize();
}

$ nvcc t916.cu -o t916
$ ./t916
index[0] = 0 index[1] = 1 index[2] = 2 index[3] = 3 $
$ nvcc -DBROKEN t916.cu -o t916
$ ./t916
index[0] = 0 index[1] = 0 index[2] = 0 index[3] = 0 $
$

as you say, the simple code runs exactly on my computer, maybe the problem lies elsewhere about my code.Let me try again.Thanks

oh, I’ve fixed it ,I just build a new project,and copy the original code to it,and add a code as follow

SAFE_CALL(cudaMemset(d_indexTab, 0,	 total2D * sizeof(size_t)),	      "Cuda alloc Failed");

Then it worked