I have issues when using mapped memory within a multithreaded application with CUDA 4.0.
A kernel which previously worked fine in a single threaded application is now failing when it writes to mapped memory. The kernel returns cudaErrorInvalidValue and further debugging with Parallel Nsight & cuda-memcheck shows “error = access violation on store” when writing in an array of booleans and an array of pointers.
My mapped memory arrays are allocated in the main host thread but are passed to the kernel by different worker threads. I suspect this may be the source of the problem. I call cudaSetDeviceFlags(cudaDeviceMapHost); on the main thread also.
Can someone explain whether it is ok to allocate mapped memory in one thread and use it on the other? On which thread should cudaSetDeviceFlags be called?
Again I’m 99% sure this isn’t a problem with the kernel as it’s extensively tested to work in a single threaded app.