CUDA Context and host multi-threading

Hi,
I thought it must be possible to reuse a floating CUDA context in a host-multithreading application several times ?
I had a look at the example “threadMigration” of CUDA2.0beta2. This made me think that something like the following should be possible (very simplistic):

  1. host thread: create a context ‘ctx’ cuCtxCreate, cuCtxPopCurrent(NULL);
  2. start thread 1; cuCtxPushCurrent( ctx); do something; cuCtxPopCurrent(NULL);
  3. host thread: wait for thread 1 to finish
  4. start thread 2; cuCtxPushCurrent( ctx); do something; cuCtxPopCurrent(NULL);
  5. host thread: wait for thread 2 to finish

But when I try this, I get a CUDA_ERROR_INVALID_VALUE return value on the second call to cuCtxPushCurrent( ctx).
I did not call cuCtxDestroy or cuCtxDetach. (I don’t know whther this might be important, but in “so something” I use the cuFFT + my own code) … Any ideas ???

Reading the documentation I think that you must pass the context pointer returned by cuCtxPopCurrent to cuCtxPushCurrent, or did you verify that it is the same as returned by cuCtxCreate?

Hm,

it’s always the same context. Nevertheless, I checked what happens in the threadMigration example by replacing

cuCtxPopCurrent( NULL );

by

CUcontext pctx = 0;

cuCtxPopCurrent( &pctx );

printf( " cuCtxPopCurrent, context is %p\n", pctx);

at the end of the ThreadProc…

Output is (nil) ?

The same situation… I checked the function ptr. They are correct. Do you have any idea? The algorithm was taken from documentation

I have the same problem, please help