Simple kernel ocassionally gives very very unusual results

I’ve distilled an unusual problem in my code to the following simple kernel. I’ve only been programming CUDA for just under a year so I’m almost certain that I’m doing something wrong, but I cannot find it.

#include <stdio.h>

// nvcc -ccbin g++ -g -m64 -std=c++11 -gencode arch=compute_30,code=sm_30 -o Weird Weird.cu

__global__ void TestKernel( unsigned count )
{
	int N = threadIdx.x + ( blockIdx.x * blockDim.x );
	if( N < count )
	{
            int id1 = floor((sqrtf( 8*N + 1 ) - 1)/2);
            int id2 = N - (id1*(id1 + 1))/2;
            ++id1;

            if( id2 == -1 )
            {           
                printf( "N = %8d -- id1 = %4d -- id2 = %4d\n" , N , id1 , id2 );
            }
	}
}

int main()
{
	unsigned numBlocks = 12200;		// Less than maximum 65535
	unsigned numThreads = 1024;
	unsigned numIterations = 12492501;	// Less than 2^32 - 1

	TestKernel<<<numBlocks,numThreads>>>( numIterations );

	cudaDeviceSynchronize();
	if( cudaGetLastError() != cudaSuccess )
	{
		printf( "Kernel Error\n" );
		fflush(stdout);
	}
}

Considering the formulae of id1 and id2 both ought to be positive. Now here’s the weird bit

N = 10619135 -- id1 = 4608 -- id2 =   -1
N = 10623744 -- id1 = 4609 -- id2 =   -1
N = 10628354 -- id1 = 4610 -- id2 =   -1
N = 10632965 -- id1 = 4611 -- id2 =   -1
N = 10637577 -- id1 = 4612 -- id2 =   -1
N = 10642190 -- id1 = 4613 -- id2 =   -1
(... 385 more ...)

Hand calculations below show that id2 ought not to be -1

N = 10619135   id1 = 4608     id2 = 4607
N = 10623744   id1 = 4609     id2 = 4608

So why is id2 occasionally -1??? … Is there some sort of overflow, are my kernel parameters wrong. I’ve checked and although they are large they seem to be valid, to the best of my limited understanding.

My GPU is a Quadro K1000M and I’m using CUDA 8, as my GPU cannot use CUDA 9. (I’m making plans to get my hands on a better GPU soon.)

I boldly predict the unexpected results will go away if you change line 10 to this:

The problem you are encountering would appear be due to the limited precision of the IEEE-754 single precision representation. For large arguments you will lose too much information in this computation when using ‘float’.

It should be instructive to print all intermediate results for one of the “failing” cases.

When I do hand calculations for N = 10619135 and id1 = 4608 I get that id2 = -1.

4608 + 1 = 4609
4608 * 4609 = 21238272
21238272 / 2 = 10619136
10619135 - 10619136 = -1

You are right on both counts! THANKS!!!

int id1 = floor((sqrt((double)( 8*N + 1 )) - 1)/2);

Does the trick.

Here’s are the code changes for printing the intermediate values in the previous implementation

auto x = sqrtf( 8*N + 1 );
int id1 = floor((x - 1)/2);
auto y = (id1*(id1 + 1))/2;
int id2 = N - y;

And the intermediate values

N = 10623744 -- x = 9219.000000 -- y = 10623745 -- id1 = 4610 -- id2 =   -1
N = 10619135 -- x = 9217.000000 -- y = 10619136 -- id1 = 4609 -- id2 =   -1
N = 10628354 -- x = 9221.000000 -- y = 10628355 -- id1 = 4611 -- id2 =   -1
N = 10632965 -- x = 9223.000000 -- y = 10632966 -- id1 = 4612 -- id2 =   -1
N = 10637577 -- x = 9225.000000 -- y = 10637578 -- id1 = 4613 -- id2 =   -1
N = 10642190 -- x = 9227.000000 -- y = 10642191 -- id1 = 4614 -- id2 =   -1
... (lots more) ...

Note that y is always N + 1, which explains why id2 is -1.

The moral of the story. Its never a compiler bug.

My hand calculations were done in python