I’ve distilled an unusual problem in my code to the following simple kernel. I’ve only been programming CUDA for just under a year so I’m almost certain that I’m doing something wrong, but I cannot find it.
#include <stdio.h>
// nvcc -ccbin g++ -g -m64 -std=c++11 -gencode arch=compute_30,code=sm_30 -o Weird Weird.cu
__global__ void TestKernel( unsigned count )
{
int N = threadIdx.x + ( blockIdx.x * blockDim.x );
if( N < count )
{
int id1 = floor((sqrtf( 8*N + 1 ) - 1)/2);
int id2 = N - (id1*(id1 + 1))/2;
++id1;
if( id2 == -1 )
{
printf( "N = %8d -- id1 = %4d -- id2 = %4d\n" , N , id1 , id2 );
}
}
}
int main()
{
unsigned numBlocks = 12200; // Less than maximum 65535
unsigned numThreads = 1024;
unsigned numIterations = 12492501; // Less than 2^32 - 1
TestKernel<<<numBlocks,numThreads>>>( numIterations );
cudaDeviceSynchronize();
if( cudaGetLastError() != cudaSuccess )
{
printf( "Kernel Error\n" );
fflush(stdout);
}
}
Considering the formulae of id1 and id2 both ought to be positive. Now here’s the weird bit
N = 10619135 -- id1 = 4608 -- id2 = -1
N = 10623744 -- id1 = 4609 -- id2 = -1
N = 10628354 -- id1 = 4610 -- id2 = -1
N = 10632965 -- id1 = 4611 -- id2 = -1
N = 10637577 -- id1 = 4612 -- id2 = -1
N = 10642190 -- id1 = 4613 -- id2 = -1
(... 385 more ...)
Hand calculations below show that id2 ought not to be -1
N = 10619135 id1 = 4608 id2 = 4607
N = 10623744 id1 = 4609 id2 = 4608
So why is id2 occasionally -1??? … Is there some sort of overflow, are my kernel parameters wrong. I’ve checked and although they are large they seem to be valid, to the best of my limited understanding.
My GPU is a Quadro K1000M and I’m using CUDA 8, as my GPU cannot use CUDA 9. (I’m making plans to get my hands on a better GPU soon.)