kernel index bug?

supersymmetry · October 21, 2017, 7:16am

For a device function like this:

void __global__ ComputeOutput(float * const C,int const num_in) 
{
	// Grid-Stride Loops
        // learnt from https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/
	for (int j_ = blockIdx.x * blockDim.x + threadIdx.x;
			 j_ <  num_in;      
			 j_ += blockDim.x * gridDim.x) {
		C[  j_ ] = float(j_);  
	}
}

is it possible that the output C[j] != j?

I encountered this issue: Most C[j] are j, but a few of them are not j.

The bug is present even if I launch the kernel with 1 thread

ComputeOutput<<<1,1>>>(  d_C, num_in);

You can reproduce the error using my code:

My environment is Matlab 2017a, Ubuntu 16.04 64-bit, CUDA-8.0, Tesla K80.

Update: I do find the error only occurs when j is relatively large (in the order of 16 millions). It’s common for me to deal with such large numbers.

Robert_Crovella · October 21, 2017, 2:00pm

At around 16 million you’ll reach the limit of what can reliably be stored in a float quantity, if you want to test for exact equality with an equivalent integer. This is not unique to CUDA. A float quantity has around 23 bits of mantissa. As a simple test, try it with C as a double array. double should have around 53 bits of mantissa, so you should be able to test for equality beyond 4 billion.

You may want to learn more about the use of floating point arithmetic in computers.

[url]Floating Point and IEEE 754 :: CUDA Toolkit Documentation

Beyond the above comments, testing for exact equality of floating point values has a variety of challenges.

supersymmetry · October 21, 2017, 3:02pm

Thanks! I forgot about floating point precision limit…

Topic		Replies	Views
Precision of floats does CUDA use half precision instead of single precision for floats? CUDA Programming and Performance	5	2285	March 15, 2010
Precision issue! Wrong result for a multiplication CUDA Programming and Performance	7	1358	April 11, 2012
GPU/CPU precision comparison and Kernel instructions question CUDA Programming and Performance	5	680	April 4, 2017
Best way to accelerate for loops in kernel? CUDA Programming and Performance cuda , kernel	5	511	December 13, 2023
CUDA reduction precision issues CUDA Programming and Performance cuda	1	414	December 13, 2022
Computing with CUDA and its kernel configurations Some problems with the configuration CUDA Programming and Performance	0	2177	February 1, 2010
not sure where to ask this but CUDA can only handle 4 byte and 8 byte floating point? CUDA Programming and Performance	10	800	July 6, 2018
Kernel launched in for loop with index offset gives incorrect result? CUDA Programming and Performance	21	29	March 4, 2025
problem with bigger than 32768-size grids CUDA bug? CUDA Programming and Performance	9	6565	January 28, 2009
How to use CUDA programming to calculate and process the correct number CUDA Programming and Performance	1	698	April 20, 2018

kernel index bug?

Related topics