cuda function pointers to class member functions (change in Pascal?)


In pre-Pascal architecture I used the following code to call class member functions via function pointers. It used to work fine and cuda memory checker did not find any problems.

Under Pascal (GTX Titan X or 1080) under CUDA 8.0 (compute_61,sm_61) I get memory errors during debug mode (Windows 10, Visual Studio 2013/15). I get an access violation on assignment of the function pointer (line 19), and another access violation on retrieval of the function pointer (line 32). The actual code runs fine and the function pointer call correctly executes. Is this a bug in memory checker under Pascal or am I doing something illegal? Minimal working example below.

Thanks and Happy New Year!

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>

template<typename T> using FunctionHandler = double(T::*)(double);

class TestClass
	FunctionHandler<TestClass> fooHandler;

	__host__ __device__ double AddOne(double x)
		return x + 1;

	__host__ __device__ void BuildPointerTable()
		this->fooHandler = &TestClass::AddOne;

__global__ void InitializeFooPointer(TestClass* obj)

__global__ void Run(TestClass* obj)
	double x = 1;

	FunctionHandler<TestClass> handler = obj->fooHandler;
	double result = (obj->*handler)(x);
	printf("x + 1: %f \n", result);

void main()
	TestClass* test = new TestClass();
	TestClass* dTest = nullptr;
	cudaError_t err = cudaMalloc(&dTest, sizeof(TestClass));
	if (err != cudaSuccess) { return; }

	err = cudaMemcpy(dTest, test, sizeof(TestClass), cudaMemcpyHostToDevice);
	if (err != cudaSuccess) { return; }

	InitializeFooPointer << < 1, 1 >> > (dTest);

	Run << < 1, 1 >> > (dTest);

	delete test;
	err = cudaFree(dTest);

I tried compiling and running your code on CUDA 8 on linux GTX Titan X Pascal.

No problems were reported by cuda-memcheck either in debug or release mode.

It might be an issue specific to the memory checker built into nsight VSE

Thanks, if anyone using Cuda 8.0 in Visual Studio on a Pascal board could spare a moment to check if this problem is reproducable I’ll file a bug report on the nsight memory checker.

Hi John_Smith_Lon,

I ran your code with VS 2013 (update 5), a Quadro P5000, CUDA 8.0, Nsight I am not experiencing any memory issues or crashes as you have noted. I ran with the memory checker and do not have any errors.

Thanks that’s very helpful. I’ll log a bug - this is the second bug I’ve found on the Alienware 17 R4 under Cuda (managed memory bug was the first which has been confirmed by nvidia).

As it turns out, the development team has determined that this particular code is exercising a windows-specific limitation of CUDA. That limitation is now documented here:

specifically your class (object) satisfies this condition:

“Let T denote a pointer to member type”

objects of such classes may not be safely passed from host to device or vice-versa:

“Passing an object of type C between host and device code has undefined behavior e.g., as an argument to a global function or through cudaMemcpy*() calls.”

as a result this is not a bug in CUDA, but a (now) stated limitation.