I am used to use size_t for variables such as index to large arrays or other quantities that could hold large ineger numbers, in other programming languages such as C.
After doing some experiments with CUDA I found that if I use size_t indx = blockIdx.x*blockDim.x +threadIdx.x;
I get bad results (for memory elements, whether in global or shared memory)
while if I change to int indx = blockIdx.x*blockDim.x +threadIdx.x;
Am I correct to conclude that I should not use size_t in CUDA?
Can you show a minimal self-contained repro code that demonstrates the issue? Also, include the exact nvcc command line used to compile the code.
There is no reason why size_t shouldn’t be fully functional, however you may want to avoid it for performance reasons. GPUs are 32-bit machines and 64-bit integer arithmetic must therefore be emulated. size_t basically maps to unsigned long long int on any supported 64-bit platform.
In most contexts, enumerating data objects with int or unsigned int will suffice.
I am using CUDA 9.2 on Windows. I don’t know whether %zu is supported by device-side printf, please check the documentation. I changed all instances of the format specifier%zu to %llu. The output of the two programs variants matches on my system (Kepler-based Quadro K420) and cuda-memcheck has no complaints.