Sleep facility inside a cuda C function Allowing OS/CPU in during a scheduling loop

If I have a cuda C function inside a .cu file. From this function I call my kernel functions and do memcpys. However if I write this function to be a scheduling loop so it never ends, I need to be able to force it to sleep on each cycle to allow the cpu/OS to schedule in and do its own stuff.
Can this be done and give the OS a good amount of time?

If I change all this and have my scheduling loop in the calling C program, which might seem more normal and then call the CUDA C function on each cycle, the CUDA C function has to setup all the data arrays on each call and cant seem to retain data from one call to the next by using global variables and memory stores within the .cu file.

Any suggestions?



As long as the context remains (generally the case with single gpu programming) the same the data will be preserved on the global memory from one kernel call to the next, unless you destroy it explicitly.

I had a break through on this issue in the way you suggest.

Basically my scheduling loop which is in a .c file is the same and has the sleep function to allow the OS/CPU to let other programs in. I have found that you can declare global cuda data either in the .c file providing the variables are declared at the head of the scheduling function thread, or in the .cu file where they can be declared as global. The cudamalloc’s to allocate memory to these pointers MUST be called from the same schedulling thread but dont need to be within the scheduling loop. So for a .c file your structure would look like

extern “C” void cuda_function1(unsigned int * DataArea);

extern “C” void cuda_function2(unsigned int * DataArea);

void my_func(void)


unsigned int *my_cuda_memory_ptr;

cudaMalloc((void **)&my_cuda_memory_ptr, DATA_SIZE_BYTES);

while (loop_active)








Dont try to move the “my_cuda_memory_ptr” as a global variable. It wont like it.

The “cuda_function1/2” is then another c function but this time in the .cu file. In there it will call cuda kernels which will work on the “my_cuda_memory_ptr”. They can do their own mallocs if need be but the data created will be lost on the cuda function exit.

Im not yet sure if the “cuda_function1” and “cuda_function2” could live in different .cu files, but I see no reason why not. So I think this would be one way of getting better program structure and splitting up different cuda processing functions and keeping file sizes sensible.

Basically all cuda functions that work on the same data must be called from the same processing thread. Once this loop and thread exits, the data is gone and the whole initialisation process must be done again.