BUG: Possible shared state in CUDA Toolkit Internals - Makes size_t 64-bit or 32-bit

facorread · March 29, 2017, 1:43pm

Hello,

I’m reporting this bug here in the Forum because the bug web service does not work; I emailed cudaissues@nvidia.com but did not get a reply.

The CUDA Toolkit shows signs of a shared, mutable internal state that affects the interpretation of size_t values. When this bug is triggered, a size_t value is interpreted as 32-bit. Else, the value is interpreted as 64-bit as expected.

I suggest this bug to be marked as high priority as size_t is used to address memory.

In the test case below, the triggerBug() function contains an innocuous sum that should have no effect on the behavior or results of the bugKernel() function. The bugIndicator variable is 64-bit. When the triggerBug() function is invoked, bugIndicator shows an overflow as if it was 32-bit.

Quadro M6000 24GB
Windows 10 64-bit
CUDA Toolkit 8.0
Visual Studio Community 2015 Update 3

Create a new Visual Studio project with the code below, and follow the instructions therein.

// CUDA Toolkit bug test case by Fabio Correa <facorread@gmail.com>
// This test case was developed on 2017/03/22.
// Instructions to trigger the bug are documented below.
// Using CUDA Toolkit 8.0 with Visual Studio Community 2015 on Debug x64.
// No additional flags or settings have been changed from their defaults.
// Try different compute capabilities.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

typedef size_t erroneousType; // Try unsigned long long as well.

__device__ static erroneousType triggerBug() {
	const erroneousType a{0x1};
	const erroneousType b{0xffffffff};
	const erroneousType c{a + b};
	return c;	// It is necessary to return c to trigger the bug. Also, try something else such as 5 to inhibit the bug.
};

__global__ void bugKernel() {
	const erroneousType a{0x1};
	const erroneousType b{0xffffffff};
	const erroneousType bugIndicator{a + b}; // Value when bug is triggered: 0. Correct value: 0x100000000.
	triggerBug(); // Run this code with this line commented out, then run it with this line enabled.
	return; // Please set a breakpoint here and debug on Nsight to inspect the value of bugIndicator.
}

int main() {
	bugKernel<<<1, 1>>>();
	// cudaDeviceReset must be called before exiting in order for profiling and
	// tracing tools such as Nsight and Visual Profiler to show complete traces.
	cudaError_t cudaStatus = cudaDeviceReset();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaDeviceReset failed!");
		return 1;
	}
	return 0;
}

Robert_Crovella · March 29, 2017, 2:01pm

What happens if you change:

0xffffffff

to:

0x0ffffffffULL

?

facorread · March 29, 2017, 2:06pm

I’ll try later today.

Hey, would you mind helping me out by verifying this bug in your system?

NVIDIA just got in touch with me and they’ll look into it.

Thanks.

Robert_Crovella · March 29, 2017, 2:14pm

Is the only way to see the bug by using the debugger?
when I print out the value of bugIndicator, it seems to be 0x100000000

facorread · March 29, 2017, 5:59pm

Thanks for your ideas.

ULL Had no effect on these behavior. And printf() invoked from the kernel triggered the bug, too:

printf("bugIndicator = 0x%.16X.\n", bugIndicator); // It triggers the bug. Also, try using a 5 instead of bugIndicator.

A nice person at NVIDIA emailed me, “Seems the issue is related to dynamic parallelism.”

Topic		Replies	Views
Size_t CUDA Programming and Performance	4	5166	September 30, 2020
conversion from size_t to float always results in zero? CUDA Programming and Performance	4	5483	February 15, 2011
Wrong unsigned to signed conversion Problem with types conversion leads to incorrect kernel's re CUDA Programming and Performance	3	6910	February 16, 2009
nvcc compiler bug CUDA Programming and Performance	5	1203	July 15, 2010
Weird behaviour with static shared memory with short ints CUDA Programming and Performance	6	760	December 12, 2019
Problems in 32-bit ---> 64-bit conversion Kernel idx goes mad, known issue? CUDA Programming and Performance	14	1379	January 24, 2011
A value of zero is either printed as 1 or 4 in a kernel for some specific case !! CUDA Programming and Performance	7	747	January 13, 2018
Can it be a problem with 'unsigned long long' (64 bit) ? CUDA Programming and Performance	0	1947	June 11, 2009
cuMemGetInfo() should use 64-bit params CUDA Programming and Performance	2	12079	May 24, 2011
Optimizing performance of a serial <<<1, 1>>> kernel, after long debugging hours CUDA Programming and Performance	13	902	July 2, 2018

BUG: Possible shared state in CUDA Toolkit Internals - Makes size_t 64-bit or 32-bit

Related topics