I am used to use size_t for variables such as index to large arrays or other quantities that could hold large ineger numbers, in other programming languages such as C.
After doing some experiments with CUDA I found that if I use
size_t indx = blockIdx.x*blockDim.x +threadIdx.x;
I get bad results (for memory elements, whether in global or shared memory)
while if I change to
int indx = blockIdx.x*blockDim.x +threadIdx.x;
Am I correct to conclude that I should not use size_t in CUDA?
Can you show a minimal self-contained repro code that demonstrates the issue? Also, include the exact
nvcc command line used to compile the code.
There is no reason why
size_t shouldn’t be fully functional, however you may want to avoid it for performance reasons. GPUs are 32-bit machines and 64-bit integer arithmetic must therefore be emulated.
size_t basically maps to
unsigned long long int on any supported 64-bit platform.
In most contexts, enumerating data objects with
unsigned int will suffice.
Thanks for your reply. I am running on Linux 64 bit (Fedora) so based on your answer I guess this may be the reason given that the GPU runs 32 bit.
At any rate, per your request, I attach two files. One is called “size_t.txt” which shows the problem. The other is called “int.txt” which works well.
It all comes down to changing the
int in the definition of the index to make it work. I am not sure why (I am still a learner of CUDA).
size_t.txt (8.8 KB) .
int.txt (8.2 KB)
I am using CUDA 9.2 on Windows. I don’t know whether
%zu is supported by device-side
printf, please check the documentation. I changed all instances of the format specifier
%llu. The output of the two programs variants matches on my system (Kepler-based Quadro K420) and
cuda-memcheck has no complaints.
I can indeed see that your choice of
%llu solves this problem.
So I guess the problem was with the
**printf** choice of format on the GPU
printf on the host has no problem with
thanks for your solution