I am currently developing for cuda, my device is GeForce GTX 480 and I am working
under linux environment.
The function I wrote is launched repeatedly. (infinite while loop)
Since I need a graphical interface, I thought of using matlab for launching the kernel.
My code line for generating is this:
[gpu_dvec_ping,gpu_lr_a,gpu_lr_b] = feval(control_handle,gpu_dvec_ping,gpu_lr_a,gpu_lr_b,gpu_dvec_pong,gpu_q,Ns);
All the variables are type gpuArray (meaning they are transferred to the device memory before launching.
When I measured the time, I noticed that there is a huge difference in the running time of the kernel.
When I launch from matlab it takes about 10 times longer than when I am launching from regular C program.
This makes no sense! I am sure I am doing something wrong.
What can possibly be delaying the kernel? Is there any data transfers between the device and the host
for some reason?
Thanks in advance