Problem with thread migration and CUFFT

On my project I’m hoping to use the new thread migration api included with CUDA 2.0 to share a context among threads. The basic functionality seems to work, but I run into problems when I create a CUFFT plan on one thread and execute on another. I’ve modified the attached CUDA 2.0 SDK threadMigration example to include CUFFT planning/execution to duplicate this behaviour.

When I run this example, get a bunch of CUFFT_INVALID_PLAN messages to the output. Am I using this incorrectly, is the CUFFT not designed to support this behavior, or is this a known issue.

I’m running on a x86-64 machine running Ubuntu Linux 7.04, with the CUDA 2.0b2 Toolkit/SDK and the latest 177.13 driver.

Thanks,
Mike
threadMigration.cpp (12.6 KB)

Anyone from NVIDIA care to comment on this one?

I think that creating plans allocates device memory for scratch space, an operation which requires the proper context. You may be need to have the same thread both create the plan and launch the calculation.

I have noticed that if I called cublas functions from two different threads, the second call hangs. This makes me think that you have to explicitly manage contexts throughout cublas/cufft just as the driver and runtime APIs require.

I believe your right, the planning is using the CUDA api to allocate device memory, which requires a context. However, if I understand correctly, the new thread migration API provided in CUDA 2 is supposed to allow for this behaviour. I’m able to plan the memory one thread and execute in another with my own kernels, but I can’t do the same for cuFFT.

I haven’t tried using Toolkit 2.0 push/pop, but I’ve had success using Josh Anderson’s GPUWorker to dispatch GPU commands in a separate dedicated thread.