My program calls the same device kernel around 9,000 times and host to device memory copies are occurring automatically after each kernel has finished even though I didn’t specify those copies. I’m not using unified memory for your information.
This issue doesn’t happen if I declare device variables under a module with the device kernel rather than declaring them in my host code. I’d like to keep the device variables declared in the host code because it seems to be using less registers.
I’d appreciate any insight regarding this problem. Thank you!