I am a PhD student from Austria and I’m currently working on a CUDA implementation where I please need some help.
I have implemented a CUDA dll with an C++ Interface and this dll is called from another C++ program. Everything works fine if I call this dll only once but if I call the dll many times in a loop, with different input values I get false results.
Option 1 [Synchronous]) Wait until the GPU calculation is done and start afterwards the next iteration
I have already used cudaDeviceSynchronize() and cudaThreadSynchronize() but nothing works.
Option 2 [Asynchronous]) ) Concurrent execution of the dll/kernels
No idea what I have to do.
I’m using CUDA4.0 on a GeForce GT425M with Windows 7.
What can I do, I have no idea?
Thanks a lot for any comments and suggestions.