A- / Synchronous call of DLL call a CUDA dll many times


I am a PhD student from Austria and I’m currently working on a CUDA implementation where I please need some help.

I have implemented a CUDA dll with an C++ Interface and this dll is called from another C++ program. Everything works fine if I call this dll only once but if I call the dll many times in a loop, with different input values I get false results.

Option 1 [Synchronous]) Wait until the GPU calculation is done and start afterwards the next iteration
I have already used cudaDeviceSynchronize() and cudaThreadSynchronize() but nothing works.

Option 2 [Asynchronous]) ) Concurrent execution of the dll/kernels
No idea what I have to do.

I’m using CUDA4.0 on a GeForce GT425M with Windows 7.

What can I do, I have no idea? :ermm:

Thanks a lot for any comments and suggestions.