[SOLVED] GPUWorker, race condition using it

Hi all,
as you know it’s not possible use inside an host thread memory allocated with cudaMalloc
in another host thread. For example you can not call cuFFT on memory allocated by another thread
and so on.
This fact is a big limit (I would like to discuss this in another thread if NVIDA cares) that’s why
GPUWorker inside the project HOOMD was created, and projects like LISSOM,

GPUWorker runs a thread in charge to execute all operations on the GPU requested by other
threads using a boost functor adaptors (boost::bind) queued inside a GPUWorker internal status,
GPUWorker was implemented using boost::thread as underlying mechanism for thread and

When a thread calls GPUWorker::call(…) this call is a synchronous call and it will return when all
jobs inside the GPUWorker queue are delivered to GPU, so far so good.

Calling GPUWorker::call(…) leads the calling thread to be suspended on a boost::condition::wait
and since boost 1.35 this is a boost thread interruption point, this is where the problem can arise.

Imagine a thread doing the following:

a) Allocate a C++ automatic object
b) Perform a GPUWorker::call(…) to perform a cudaMempcpy from Device to Host memory
allocated in a)

if the calling thread (suspended on boost::condition::wait () inside b) ) receives a thread::interruption then
the GPUWorker::call(…) exits due to a throw boost::thread_interrupted and the object in a) is
destroyed, this leads the CUDA driver (the real GPU call is still going) to complete his copy
on memory freed corrupting someone else memory. The solution is to disable thread::interruption
and reenable it when safe:

a) Allocate a C++ automatic object
a1) Disable interruption
b) Perform a GPUWorker::call(…) to perform a cudaMempcpy from Device to Host memory
allocated in a)
b1) Reenable interruoption, throwing boost::interruption if an interruption was requested during
the call

a1 and b1 can be put inside the GPUWorker::call(…) as first and last think to do, however “fix”
the GPUWorker::callAsync is not easy (I think not doable).

Fixing the callAsync (if doable) isn’t worth indeed using callAsync the user knows that doing
a callAsync with a memcpy is not that safe, it’s like doing (without using GPUWorker at all):

a) Allocate host memory
B) Performing cudaMemcpyAsync on that memory
c) execute code that can throw an exception unwinding the memory in a).

Gaetano Mendola