Visual Studio 2008 calling cuda functions and program structure Calling cuda code from different fun


I am using visual Studio 2008 with MFC and in my program I have split up my CUDA source code into three functions, Initialise, PutData and ProcessData. The initialise function simply sets up the pointers and calls CudaMalloc to allocate memory space for all my working arrays. The PutData function is called to place in my input data array, this function does a cudamemcpy from the incoming data array to the cuda allocated memory array (making sure the correct hosttodevice or devicetodevice enum is used).
Then final ProcessData then is the C code to call the special Cuda Kernels. All these routines are compiled in the same .cu file.

The problem is that I always get a cudaError at memory location blah blah, when I call the PutData routine. If I move the cudamemcpy into the Initialise function, it works with no error. Why cant I split up my cuda program into different calls.

I could make my function a once through function call, but as the project develops this is going to become too large. I need to be able to split the cuda functions into different calls and possibly even different .cu files as well (that didnt work either, but thats another story).

Any suggestions??



It sounds good!

I also use C++ to encapsulate CUDA operation into three phases like you.

Could you simplify your code to show this error and post it?


Okay attached is a simple program to give the general idea.
I would call “init_cuda” file to allocate memory for my working arrays. In here I have the memcpy function which will work and not cause a crash.

I then have a “put_cuda” function, which I would like to call during normal runtime, passing in a pointer to my data which would then be copied to the cuda stuff. This memcpy causes the system to crash with a “cudaError”. Im sure its because the allocation of memory during the initialise has been de-allocated.

After that I would call a “run_cuda” function to perform some operation on the data.

From my experience, its not possible to create some global variables and allocate memory to them at startup and then later use them. The cuda runtime is only ever going to allow you to create, use and then close data items all in a single cuda function call.

This doesnt allow for good program structure and I cant then create pinned memory and pass that to my host calling functions so it can fill the array before running cuda.

Any ideas how I might be able to achieve this?


Guy (1.37 KB)


Have you found a solution to breaking up your kernel into multiple files? Currently I am just getting into CUDA programming for use in my research, which is showing huge performance gains, but my code is quickly becoming unwieldy due to multiple kernels and functions in a file. I have been looking for some documentation on how to organize CUDA code through use of multiple .cu files and .cuh files, but haven’t really found a whole lot of information. If you have any success or find any documentation could you please post it.


  • Nik

I have come up with a solution on a similar topic I raised and I copied it over here…

Basically my scheduling loop which is in a .c file is the same and has the sleep function to allow the OS/CPU to let other programs in. I have found that you can declare global cuda data either in the .c file providing the variables are declared at the head of the scheduling function thread, or in the .cu file where they can be declared as global. The cudamalloc’s to allocate memory to these pointers MUST be called from the same schedulling thread but dont need to be within the scheduling loop. So for a .c file your structure would look like

extern “C” void cuda_function1(unsigned int * DataArea);
extern “C” void cuda_function2(unsigned int * DataArea);

void my_func(void)

unsigned int *my_cuda_memory_ptr;

cudaMalloc((void **)&my_cuda_memory_ptr, DATA_SIZE_BYTES);

while (loop_active)






Dont try to move the “my_cuda_memory_ptr” as a global variable. It wont like it.
The “cuda_function1/2” is then another c function but this time in the .cu file. In there it will call cuda kernels which will work on the “my_cuda_memory_ptr”. They can do their own mallocs if need be but the data created will be lost on the cuda function exit.

Im not yet sure if the “cuda_function1” and “cuda_function2” could live in different .cu files, but I see no reason why not. So I think this would be one way of getting better program structure and splitting up different cuda processing functions and keeping file sizes sensible.

Basically all cuda functions that work on the same data must be called from the same processing thread. Once this loop and thread exits, the data is gone and the whole initialisation process must be done again.