I had a break through on this issue in the way you suggest.
Basically my scheduling loop which is in a .c file is the same and has the sleep function to allow the OS/CPU to let other programs in. I have found that you can declare global cuda data either in the .c file providing the variables are declared at the head of the scheduling function thread, or in the .cu file where they can be declared as global. The cudamalloc’s to allocate memory to these pointers MUST be called from the same schedulling thread but dont need to be within the scheduling loop. So for a .c file your structure would look like
extern “C” void cuda_function1(unsigned int * DataArea);
extern “C” void cuda_function2(unsigned int * DataArea);
unsigned int *my_cuda_memory_ptr;
cudaMalloc((void **)&my_cuda_memory_ptr, DATA_SIZE_BYTES);
Dont try to move the “my_cuda_memory_ptr” as a global variable. It wont like it.
The “cuda_function1/2” is then another c function but this time in the .cu file. In there it will call cuda kernels which will work on the “my_cuda_memory_ptr”. They can do their own mallocs if need be but the data created will be lost on the cuda function exit.
Im not yet sure if the “cuda_function1” and “cuda_function2” could live in different .cu files, but I see no reason why not. So I think this would be one way of getting better program structure and splitting up different cuda processing functions and keeping file sizes sensible.
Basically all cuda functions that work on the same data must be called from the same processing thread. Once this loop and thread exits, the data is gone and the whole initialisation process must be done again.