Simple kernel ocassionally gives very very unusual results

Olumide · February 26, 2018, 9:44pm

I’ve distilled an unusual problem in my code to the following simple kernel. I’ve only been programming CUDA for just under a year so I’m almost certain that I’m doing something wrong, but I cannot find it.

#include <stdio.h>

// nvcc -ccbin g++ -g -m64 -std=c++11 -gencode arch=compute_30,code=sm_30 -o Weird Weird.cu

__global__ void TestKernel( unsigned count )
{
	int N = threadIdx.x + ( blockIdx.x * blockDim.x );
	if( N < count )
	{
            int id1 = floor((sqrtf( 8*N + 1 ) - 1)/2);
            int id2 = N - (id1*(id1 + 1))/2;
            ++id1;

            if( id2 == -1 )
            {           
                printf( "N = %8d -- id1 = %4d -- id2 = %4d\n" , N , id1 , id2 );
            }
	}
}

int main()
{
	unsigned numBlocks = 12200;		// Less than maximum 65535
	unsigned numThreads = 1024;
	unsigned numIterations = 12492501;	// Less than 2^32 - 1

	TestKernel<<<numBlocks,numThreads>>>( numIterations );

	cudaDeviceSynchronize();
	if( cudaGetLastError() != cudaSuccess )
	{
		printf( "Kernel Error\n" );
		fflush(stdout);
	}
}

Considering the formulae of id1 and id2 both ought to be positive. Now here’s the weird bit

N = 10619135 -- id1 = 4608 -- id2 =   -1
N = 10623744 -- id1 = 4609 -- id2 =   -1
N = 10628354 -- id1 = 4610 -- id2 =   -1
N = 10632965 -- id1 = 4611 -- id2 =   -1
N = 10637577 -- id1 = 4612 -- id2 =   -1
N = 10642190 -- id1 = 4613 -- id2 =   -1
(... 385 more ...)

Hand calculations below show that id2 ought not to be -1

N = 10619135   id1 = 4608     id2 = 4607
N = 10623744   id1 = 4609     id2 = 4608

So why is id2 occasionally -1??? … Is there some sort of overflow, are my kernel parameters wrong. I’ve checked and although they are large they seem to be valid, to the best of my limited understanding.

My GPU is a Quadro K1000M and I’m using CUDA 8, as my GPU cannot use CUDA 9. (I’m making plans to get my hands on a better GPU soon.)

njuffa · February 26, 2018, 10:18pm

I boldly predict the unexpected results will go away if you change line 10 to this:

The problem you are encountering would appear be due to the limited precision of the IEEE-754 single precision representation. For large arguments you will lose too much information in this computation when using ‘float’.

It should be instructive to print all intermediate results for one of the “failing” cases.

tryabin · February 26, 2018, 10:33pm

When I do hand calculations for N = 10619135 and id1 = 4608 I get that id2 = -1.

4608 + 1 = 4609
4608 * 4609 = 21238272
21238272 / 2 = 10619136
10619135 - 10619136 = -1

Olumide · February 26, 2018, 10:50pm

You are right on both counts! THANKS!!!

int id1 = floor((sqrt((double)( 8*N + 1 )) - 1)/2);

Does the trick.

Here’s are the code changes for printing the intermediate values in the previous implementation

auto x = sqrtf( 8*N + 1 );
int id1 = floor((x - 1)/2);
auto y = (id1*(id1 + 1))/2;
int id2 = N - y;

And the intermediate values

N = 10623744 -- x = 9219.000000 -- y = 10623745 -- id1 = 4610 -- id2 =   -1
N = 10619135 -- x = 9217.000000 -- y = 10619136 -- id1 = 4609 -- id2 =   -1
N = 10628354 -- x = 9221.000000 -- y = 10628355 -- id1 = 4611 -- id2 =   -1
N = 10632965 -- x = 9223.000000 -- y = 10632966 -- id1 = 4612 -- id2 =   -1
N = 10637577 -- x = 9225.000000 -- y = 10637578 -- id1 = 4613 -- id2 =   -1
N = 10642190 -- x = 9227.000000 -- y = 10642191 -- id1 = 4614 -- id2 =   -1
... (lots more) ...

Note that y is always N + 1, which explains why id2 is -1.

The moral of the story. Its never a compiler bug.

Olumide · February 26, 2018, 10:55pm

My hand calculations were done in python

Topic		Replies	Views
thread id repetitions leading to problems issues with problem sizes over 16777216 CUDA Programming and Performance	4	833	July 24, 2011
Unable to get correct square root results in kernel Legacy PGI Compilers	2	1560	April 5, 2018
more iterations = worse outcome Pi CUDA Programming and Performance	4	2276	February 20, 2011
kernel launched but give wrong results strange problem small instance size works well, bigger someti CUDA Programming and Performance	6	1827	March 27, 2009
kernel index bug? CUDA Programming and Performance	2	565	October 21, 2017
Possible nvcc bug? CUDA Programming and Performance	13	8830	January 9, 2011
Incorrect Result after large loop in kernel CUDA Programming and Performance	7	13759	November 2, 2010
Issue with Nested for-loops in kernel CUDA Programming and Performance	13	2407	April 4, 2019
Kernel Algorithm problem A kernel to compute pi on the GPU does not completely work. CUDA Programming and Performance	0	2057	August 3, 2009
Possible bug in arithmetic? Difference in emulated and "normal" mode CUDA Programming and Performance	6	4530	December 30, 2009

Simple kernel ocassionally gives very very unusual results

Related topics