BUG: Possible shared state in CUDA Toolkit Internals - Makes size_t 64-bit or 32-bit


I’m reporting this bug here in the Forum because the bug web service does not work; I emailed cudaissues@nvidia.com but did not get a reply.

​​The CUDA Toolkit shows signs of a shared, mutable internal state that affects the interpretation of size_t values. When this bug is triggered, a size_t value is interpreted as 32-bit. Else, the value is interpreted as 64-bit as expected.

I suggest this bug to be marked as high priority as size_t is used to address memory.

In the test case below, the triggerBug() function contains an innocuous sum that should have no effect on the behavior or results of the bugKernel() function. The bugIndicator variable is 64-bit. When the triggerBug() function is invoked, bugIndicator shows an overflow as if it was 32-bit.

Quadro M6000 24GB
Windows 10 64-bit
CUDA Toolkit 8.0
Visual Studio Community 2015 Update 3

Create a new Visual Studio project with the code below, and follow the instructions therein.

// CUDA Toolkit bug test case by Fabio Correa <facorread@gmail.com>
// This test case was developed on 2017/03/22.
// Instructions to trigger the bug are documented below.
// Using CUDA Toolkit 8.0 with Visual Studio Community 2015 on Debug x64.
// No additional flags or settings have been changed from their defaults.
// Try different compute capabilities.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

typedef size_t erroneousType; // Try unsigned long long as well.

__device__ static erroneousType triggerBug() {
	const erroneousType a{0x1};
	const erroneousType b{0xffffffff};
	const erroneousType c{a + b};
	return c;	// It is necessary to return c to trigger the bug. Also, try something else such as 5 to inhibit the bug.

__global__ void bugKernel() {
	const erroneousType a{0x1};
	const erroneousType b{0xffffffff};
	const erroneousType bugIndicator{a + b}; // Value when bug is triggered: 0. Correct value: 0x100000000.
	triggerBug(); // Run this code with this line commented out, then run it with this line enabled.
	return; // Please set a breakpoint here and debug on Nsight to inspect the value of bugIndicator.

int main() {
	bugKernel<<<1, 1>>>();
	// cudaDeviceReset must be called before exiting in order for profiling and
	// tracing tools such as Nsight and Visual Profiler to show complete traces.
	cudaError_t cudaStatus = cudaDeviceReset();
	if (cudaStatus != cudaSuccess) {
		fprintf(stderr, "cudaDeviceReset failed!");
		return 1;
	return 0;

What happens if you change:





I’ll try later today.

Hey, would you mind helping me out by verifying this bug in your system?

NVIDIA just got in touch with me and they’ll look into it.


Is the only way to see the bug by using the debugger?
when I print out the value of bugIndicator, it seems to be 0x100000000

Thanks for your ideas.

ULL Had no effect on these behavior. And printf() invoked from the kernel triggered the bug, too:

printf("bugIndicator = 0x%.16X.\n", bugIndicator); // It triggers the bug. Also, try using a 5 instead of bugIndicator.

A nice person at NVIDIA emailed me, “Seems the issue is related to dynamic parallelism.”