cuda memory allocation cuda memory allocation outside the processing loop

snsvsn · April 11, 2011, 3:35pm

Hello Everbody,
I have a strange problem and i hope i have come to the right place to find some help.
For my project i am trying to use CUDA to do flat field correction on an incoming sequence of images. Currently in my setup i have created a CUDA dll in visual studio and calling this DLL from labview. LAbview is aquiring the images from the frame grabber and sending it to GPU via the CUDA dll that i created for processing. Everything works fine the flat field correction is also fine . However my current implementation is not efficient. For the correction i need two constant images the flat field and the dark field. In the current setup every time the dll is called i copy the flat field and the dark field to the device memory for calculations . However since these images are constant is there a way i can copy these images to the device memory outside the image acquisition loop and then after the acquisition is done free the device memory. In other words i want to do a cudamalloc and a cudamemcpy of the two constant images outside the acquisition loop. Use the device ptrs in the acquisition loop and then free them at the end.

Regards
SNS

menohack · April 12, 2011, 11:19am

Here’s an idea that might not be efficient or easy to code, but it’s the first that comes to mind:

When your dll is loaded for the first time have it spawn a new process (on your cpu) that allocates the memory and stores the images. Then you can use some interprocess communication like a socket to request the pointers to the images. When you are all done be sure to kill the process.

This is a roundabout method and depending on your experience with OS programming may be too difficult.

I’ll think about it more. There’s probably an easier way.

snsvsn · April 13, 2011, 12:49am

Thank you for your response. I will read about it and see how it goes. I am an electrical engineer with very less if not nil experience in OS programming but i will give it a shot…

I also had one more question. In every loop each time the dll is called i allocate around 6 memory locations on the CUDA device using cudamalloc. So in the dll i just had 6 cudamalloc and corresponding cudafree calls nothing else. This itself is taking around 14-15 ms . Is this how much its supposed to take to allocate memory on the device. I read on this forum that the initial cudamalloc takes time as the deivce has to get initialized. So am i correct in assuming that every time the dll is called in the loop the device gets initialized.

Regards

SNSVSN

Paritosh · September 20, 2011, 8:34am

Hi
I am also new to CUDA and labview. Can you pls help me by sending the .cu file that you use in Visual studio to make a dll file for labview? I am really having trouble creating a dll file to be imported in labview. Pls help.