I have a strange problem and i hope i have come to the right place to find some help.
For my project i am trying to use CUDA to do flat field correction on an incoming sequence of images. Currently in my setup i have created a CUDA dll in visual studio and calling this DLL from labview. LAbview is aquiring the images from the frame grabber and sending it to GPU via the CUDA dll that i created for processing. Everything works fine the flat field correction is also fine . However my current implementation is not efficient. For the correction i need two constant images the flat field and the dark field. In the current setup every time the dll is called i copy the flat field and the dark field to the device memory for calculations . However since these images are constant is there a way i can copy these images to the device memory outside the image acquisition loop and then after the acquisition is done free the device memory. In other words i want to do a cudamalloc and a cudamemcpy of the two constant images outside the acquisition loop. Use the device ptrs in the acquisition loop and then free them at the end.