Different __device__ variables behind one name

I am working on my first big CUDA project and I am really frustrated by a problem I do not understand.

My project is structured as follows:

    [*]In "barnes_hut_cuda.h" there are several "global" pointers to device memory, e. g. ``` __device__ float3* positions = NULL;. ``` [*]In "barnes_hut_cuda.cu" they are initialized by »initCuda()«. »setRandomValues()« generates some useful data and copies it to the device memory.[*]Every kernel is located in "kernel1.cu" to "kernel6.cu" and every "kernel[i].h" provides a function to run the kernel. -> In "kernel1.cu" the function »initKernel1()« does some initialization work and »void kernel1(void)« runs the kernel.

The main-function simply calls some functions in the following order:

[list=1][]»initCuda()« → invokes »initKernel1()«[]»setRandomValues()«[*]»kernel1()«.

Now the following problem occurs: After initializing the pointers, they have a specific address and everything is fine. Then, »initKernel1()« is invoked, where the pointers have the value »NULL«. After that in »setRandomValues()« again the pointers are initialized. Invoking »kernel1()« shows NULL-pointers again.

I wrote a main function, that invokes the functions in the right order and in every function I print the value of one pointer (where it points to on device Memory) and the location of the variable on the host memory (in brackets). Here is the output:

positions = 310000 (&61f5b8) (initCuda())

positions = 0 (&61f648) (initKernel1())

positions = 310000 (&61f5b8) (setRandomValues())

positions = 0 (&61f648) (kernel1())

The result clearly shows that for the functions in “barnes_hut_cuda.cu” the variables defined in “barnes_hut_cuda.h” are different to those seen by the functions defined in “kernel1.h”, although they should be the same.

It seems that every time “barnes_hut_cuda.h” is included, all variables are defined again. Maybe this is an issue with the »__device__«-macro.

(I compile the source files one by one to object code and then link them together. Normally that should work. (The makefile of NVIDIA does the same.))

That is indeed the case, as CUDA versions before the upcoming 5.0 release don’t have a linker on the device side. So two variables defined in two different files will be two different entities. If you want kernels to share data, they must be in the same compilation unit.

(side note: Even in standard C it is not possible to have two definitions of the same variable. You can have multiple declarations, but only one definition.)

Thanks for that quick reply. Now I will wait for CUDA 5.0, because packing six complex kernels in one source file won’t be a good idea.

Why not?

If you just don’t want to have them in the same source file, you can still keep them in separate files and the [font=“Courier New”]#include[/font] them into one compilation unit.

This is an extract of my project. I use #include, but do I use it in the wrong way?

kernel1.h:

#ifndef KERNEL1_H

#define KERNEL1_H

void kernel1();

void initKernel1();

#endif	// KERNEL1_H

barnes_hut_cuda.h looks like this:

#ifndef BARNES_HUT_CUDA_H

#define BARNES_HUT_CUDA_H

#include <cutil_inline.h>

#include <stdlib.h>

// some macros

__device__ float3* positions = NULL;

// more variables like "positions"

#endif // BARNES_HUT_CUDA_H

barnes_hut_cuda.cu:

#include <cutil_inline.h>

#include <time.h>

#include "barnes_hut_cuda.h"

#include "kernel1.h"

/*#include "kernel2.h"

#include "kernel3.h"

#include "kernel4.h"

#include "kernel5.h"

#include "kernel6.h"*/

// ...

void initCuda(int _N){

	N = _N;

	cudaMalloc((void**) &positions, 2*N*sizeof(float3));

	printf("[ptr] positions = %10lx (&%10lx) (initCuda())\n", (unsigned long) positions, (unsigned long) &positions);

	// some more memory allocation

	initKernel1();

}

// ...

void setRandomValues(){

	printf("[ptr] positions = %10lx (&%10lx) (setRandomValues())\n", (unsigned long) positions, (unsigned long) &positions);

	// ...

}

main.cpp:

#include <stdlib.h>

#include "barnes_hut_cuda.h"

#include "kernel1.h"

//#include "kernel2.h"

//#include "kernel3.h"

//#include "kernel4.h"

//#include "kernel5.h"

using namespace std;

void testKernel1();

int main(int argv, char** argc){

	testKernel1();

	return EXIT_SUCCESS;

}

void testKernel1(){

	initCuda(8);

	setRandomValues();

	kernel1();

}

kernel1.cu:

#include <cutil_inline.h>

#include <stdio.h>

#include "barnes_hut_cuda.h"

#include "kernel1.h"

void initKernel1(){

	printf("[ptr] positions = %10lx (&%10lx) (initKernel1())\n", (unsigned long) positions, (unsigned long) &positions);

	

	// ...

}

void kernel1(){

	printf("[ptr] positions = %10lx (&%10lx) (kernel1())\n", (unsigned long) positions, (unsigned long) &positions);

	// invokes kernel

}

Are you using the driver API, or the runtime API? If using the runtime API, you need to include the actual kernel code, not just the header files. I.e., main.cpp should start with

#include <stdlib.h>

#include "barnes_hut_cuda.h"

#include "kernel1.cu"

#include "kernel2.cu"

#include "kernel3.cu"

#include "kernel4.cu"

#include "kernel5.cu"

To make the deviation from standard C conventions obvious, some people prefer to use the .cuh file extension in this case.

I tried to implement your advice. I experimented a little bit with the #includes and now it really works. I was not aware of the meaning of “compilation unit”, but now I am.

Thank you for your help solving that issue.

Now the output of my program looks like this:

[ptr] positions = 	310000 (&	6086c0) (initCuda())

[ptr] positions = 	310000 (&	6086c0) (initKernel1())

[ptr] positions = 	310000 (&	6086c0) (setRandomValues())

[ptr] positions = 	310000 (&	6086c0) (kernel1())

External Image

For all those with the same problem, I changed the following:

barnes_hut_cuda.cu:

#include "kernel1.cu"	// which was "#include "kernel1.h" before

Now I only compile main.cpp and the CUDA sourcefile barnes_hut_cuda.cu, because the second one includes kernel1.cu. (Later it will also include the other cuda source files)

That’s all. I didn’t change the #includes in main.cpp, because the try led to problems with gcc.

Sorry I had misread the structure of your program. Still you found the correct solution!