Can not obtain device function pointer

I ran into a weird situation when trying to obtain pointer of a certain device function either doesn’t compile at all(CUDA 9) or compiles, but overflows stack even before any kernel attempts to write the pointer to memory(CUDA 8).

I’ve worked with CUDA function pointers before and the whole thing used to function without any complication, so I’ll probably need to explain the whole picture, as posting several hundred lines of code here makes no sense.

So, Here’s the situation: I was restructuring my octree based ray tracer and decided to abstract the octree part away, by creating a small struct, that keeps several function pointers that will take void*(that points to that octree in actuality) for one parameter and also all the other things a generic raycaster would need to do a raycast and when called, just call the underlying real function from the octree, or whatever the void* it thinks it is. So, the whole thing looks somewhat like this (well… no exactly, but it does the same and having tried this as well, the problem stays the same):

struct raycaster {
	const void* raycaster;
	bool(*castFunction)(const void *, const Ray&, raycast_hit &);

	__device__ __host__ inline bool raycast(const Ray &r, raycast_hit &output)const {
		return castFunction(raycaster, r, output);

	template<typename Type>
	__device__ __host__ inline static bool raycastGeneric(const void *caster, const Ray &r, raycast_hit &hit) {
		return ((const Type*)caster)->raycast(r, output);

	template<typename Type>
	__device__ __host__ inline void use(const Type *object) {
		raycaster = ((void*)object);
		castFunction = raycastGeneric<Type>;

To make it clear, I was already using the same logic with my shader equivalents, lights and lenses, so I wrote this with some confidence, but for some bizarre reason, CUDA 9 refuses to compile the same for the octree and CUDA 8 causes stack overflow even if use() is never called, but is written somewhere in code.

The only real difference between what I had done before and what I attempted to do with this, is that the functions behind octree are significantly larger and more complex than what I used with lights and shaders. Other than that, the code is virtually identical.

This also might be significant: the code did compile, when I removed one line from octree’s cast function, but that line was responsible for starting some process with 0 value, so it was kind of needed there and it’s absence would cause who knows what…

I’ll be grateful, if anyone can figure out what’s happening here and somewhat enlighten me :).

I’m back with an update. I basically thought, nvcc was not going to give me the pointer to those large functions, because of size limitations for non-kernels or something like that and basically gave up on that, leaving the code in it’s former state. Then I started implementing a proper raytracer (whatever I had was a basic test for octree and nothing more serious than that) and encountered something that I can only describe as a bug. The thing is, every time a kernel tried to touch an instance of a lense (that is implemented in the same manner as above), it just failed. OK… I first thought it wasn’t being uploaded properly and reviewed my code for SceneHandler (a class responsible for that and nothing else), but that test functioned properly, calling all the right functions at all the right places and giving me consistent results. Then I tried tinkering and encountered this:

when lense’s “use” function is edited like this:

__dumb__ is my macro for __device__ __host__ inline 
(don't ask why; being a personal project I don't really care) 
template<typename LenseType>
__dumb__ void LenseFunctionPack::use() {
	getScreenPhotonFunction = getScreenPhotonGeneric<LenseType>;
	toScreenSpaceFunction = toScreenSpaceGeneric<LenseType>;
	//* "unnecessary" code I added starts here
	printf("LENSE FUNCTION PACK(%p): ", this);
	for (size_t i = 0; i < sizeof(LenseFunctionPack); i++)
		printf(" %02X", ((char*)((void*)this))[i] & 0xFF);
	//*/ "unnecessary" code I added ends here

if I run the following sanityCheck() function:

__global__ void testSceneHandle(Scene<BakedTriFace> *scene) {
	RaycastHit<Shaded<BakedTriFace> > hit;
	if (scene->geometry.cast(Ray(Vertex::zero(), Vector3::one()), hit))
		printf("Raycast hit something\n");
	printf("Raycast hit nothing\n");
	PhotonPack result;
	bool noShadows;
	printf("CALLING scene->lights[0].getPhoton()...\n");
	scene->lights[0].getPhoton(Vertex::zero(), &noShadows, result);
	printf("CALL FOR scene->lights[0].getPhoton() JUST ENDED\n");
	printf("len(illuminationPhotons): %d\n", result.size());
	scene->cameras[0].getPhoton(Vector2::zero(), result);
	printf("len(screenPhotons): %d\n", result.size());
	LenseFunctionPack functions;
bool sanityCheck() {
	if (cudaSetDevice(0) != cudaSuccess) {
		std::cout << "SETTING DEVICE FAILED QUITE A BIT MISERABLY" << std::endl;
		return false;
	Scene<BakedTriFace> scene;
	Vector3 direction = Vector3(0.2f, -0.4f, 0.7f).normalized();
		Photon(Ray(-direction * 10000.0f, direction),
			Color(1.0f, 1.0f, 1.0f)));
	scene.cameras[0].transform.setPosition(Vector3(0, 0, -128));
	SceneHandler<BakedTriFace> sceneHandler(scene);
	testSceneHandle<<<1, 1>>>(sceneHandler.getHandleGPU(0));
	bool rv;
	if (cudaDeviceSynchronize() != cudaSuccess) {
		std::cout << "DEVICE FAILED..." << std::endl;
		rv = false;
	else rv = true;
	std::string line;
	std::cout << "PRESS ENTER TO CONTINUE...";
	std::getline(std::cin, line);
	std::cout << std::endl << std::endl << std::endl;
	return rv;

I get this (It’s not code; just copy-pasted whatever the console gave me):

LENSE FUNCTION PACK(000000914939F010):  50 0B FB AD F6 7F 00 00 60 12 FB AD F6 7F 00 00
LENSE FUNCTION PACK(0000000704000350):  28 00 00 00 00 00 00 00 30 00 00 00 00 00 00 00
Raycast hit nothing
CALLING scene->lights[0].getPhoton()...
LenseTest::Private::Garbage::getScreenPhoton() called on DEVICE
CALL FOR scene->lights[0].getPhoton() JUST ENDED
len(illuminationPhotons): 0
len(screenPhotons): 0
LENSE FUNCTION PACK(00000200F5FFFBB8):  08 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00

Here are several strange things:

  1. LenseTest::Private::Garbage::getScreenPhoton() is a function from the lense test and there’s no way the compiler would assign it’s address to a light’s function pack;
  2. LENSE FUNCTION PACK lines from the device(those are 2 and 9) don’t share the same function pointers, but that’s sort of understandable, as the compiler could compile those twice, “thanks” to the templates;
  3. The kernel does not fail for some reason, but if I return the LenseFunctionPack::use() to it’s original “non-printing” state, the last two lines in the kernel get the compiler optimization treatment and the kernel fails, despite the fact, that the other function pack has the same pointers, if you check them manually.

When the last two lines from the kernel are removed, the output looks like this:

LENSE FUNCTION PACK(000000328775F150):  50 0B C9 8E F7 7F 00 00 60 12 C9 8E F7 7F 00 00
LENSE FUNCTION PACK(0000000704000350):  28 00 00 00 00 00 00 00 30 00 00 00 00 00 00 00
Raycast hit nothing
CALLING scene->lights[0].getPhoton()...
CALL FOR scene->lights[0].getPhoton() JUST ENDED
len(illuminationPhotons): 0

As you might see, the “calling another function bug thing” is gone, lense function pack contains the same pointers, but calling them crashes the kernel.

All of these happened with CUDA 9, I haven’t tried CUDA 8 yet and once I do, I’ll return with an update, if something changes.
Clearly, either there’s a bug, or just I do not understand, what’s happening here, so if anyone has any idea about what’s happening here, please give me any suggestions.

Hello back. I figured the thing out. The “impossible” function call from that sanity check thing gave me a reason to think, that the function pointer pointed to some place, where the function used to reside and than it was overwritten with something else. Indeed, I had separate compilation and linking turned off and all the tests were in separate .cu files, creating their own code segments here and there, never considering the presence of each other, corrupting the data and living not exactly happily. I just added -dc, recompiled and the project is up and running again.
The first questions still stands, but whatever, I guess…
At least by writing this here, I may slightly reduce the risk of somebody else making the same foolish mistake.