let’s say my program runs 2 parallel threads that both use CUDA and NPP functions. It worked fine in CUDA 3.2, as each thread got it’s own device context and they didn’t interfere with each other.
Now I switched to CUDA 4.0 and the thread start disturbing each other. I don’t know exactly what is going on, but some erroneous data is copied back to the host sometimes, and I’m pretty sure it’s a thread safety problem, as everything works fine with only one of them running.
Now my question is what the preferred method to handle this is. Should I use streams for the memory copies and kernels? But what about the NPP calls then? To clarify, it is not important for me to run different CUDA calls concurrently. I just want to prevent calls from different thread interfereing with each other.