I ve got a C++ program which is using CUDA through an external function, just like the cpp integration example.
I m copying some large variables to the device (global memory) at the beginning of the program and then
I do not use them in the CPU program at all. I was wondering if i could just leave them
there in the device and not allocate, copy or free them again.
The external function is being called all the time…its a real time DSP program. So im just copying these variables
to the GPU and back to the CPU all the time - at each iteration. The thing is that once i create them in the CPU main program and send them to the device
I do not process them again in the CPU part of the program at all.
All the processing is being done in CUDA. Then i copy them
back to the CPU (processed) and back to the GPU in the next iteration to process them again.
The point is as I said that I do not use them in the CPU part of the program. Only in cuda.
These variables are too large to stick them in shared memory…
How would I just allocate them once in the device code and then leave them there and not free them and not do the
meaningless copying all the time???
In case I m not making any sense, here is what happens in short:
1.Load filter response in CPU
2.Send filter response and copy it to GPU.
3.Send realtime audio buffer to GPU.
4.Send and copy circular buffer to the GPU.
5.Process the 3 in the GPU.
6.Send processed audio buffer to the CPU and from there to the output of the program.
7.Free device arrays
So my problem is that I do not want to send the filter response (and the circular buffer) back and forth all the time…
Cause i don’t have to. I do not use them at all in the CPU part after initialization…
What do i do?