floating contex problem

Hi gentlemen,

i have a problem with the floating cuda context.

Code Sample:
Thread A:
cuCtxCreate(&cuDeviceContext, CU_CTX_SCHED_AUTO, cuDeviceHandle);
cuCtxPopCurrent(NULL);
//This will be succeeded !

Thread B:
cuCtxPushCurrent(cuDeviceContext);
… allocate memory or something
cuCtxPopCurrent(NULL);
//This will be succeeded !

Thread C:
cuCtxPushCurrent(cuDeviceContext);
//at this code position the return value is CUDA_ERROR_INVALID_VALUE


Also when I try:

Thread B:
cuCtxPushCurrent(cuDeviceContext);
… allocate memory or something
cuCtxPopCurrent(NULL);
cuCtxPushCurrent(cuDeviceContext);
//at this code position the return value is CUDA_ERROR_INVALID_VALUE

Have anybody a idea what is wrong ?

Thanks and Regards

Mirek

I’ve the same problem.

I think, I’ve tried almost all possibilities. I’ve introduced a semaphore-like variable… Tried to “attach” it (the usage count must be increased and therefore the pop() operation must fail but this is the not the case)… and so on… but it’s always the same.

It looks like that you can call cuCtxPushCurrent() only once and after the next cuCtxPopCurrent() this context will be destroyed or becomes invalid. The invalid_value error says nothing about the failure in my opinion. This can mean almost everything.

The threadMigration example in the sdk did not help to understand these facts better in my opinion.

If you have found a solution for this problem in the meanwhile I would be very thankful if you could share this solution with me or can give me a hint.

If anybody from NVIDIA read this comment, it would be very nice to get a statement from him/her. Is there a possibility to use the ctxPush(), ctxPop() functions to create such a structure.

  1. Thread(Main thread)
    ->create Ctx
    ->save CtxId in Object
    ->pop()
    ->send object to next thread
  2. Thread
    ->receive object()
    ->Push(CtxId in object)
    ->doing things
    ->Pop()
    ->send object to next thread
  3. Thread
    ->receive object()
    ->push(ctxId in object)
    ->doing things
    ->Pop
    ->send object to next thread


    last thread
    ->receive object()
    ->Push(ctxId in Object)
    ->doing things
    ->Pop()
    ->send object to first thread
  4. thread
    ->receive object()
    ->Destroy(ctxId in object)

Best regards
Patrick

Hello,

I think, I figured out the problem. The usage of Pop() and Push() is limited to the driver api. This means a use case like the one I described before is possible, BUT all cuda-calls between push() and pop() must be defined in the driver api.

Furthermore, this means you have to use the memory-calls like e.g. cuMemAlloc() instead of cudaMemAlloc().
I’m not sure whether you can use the kernel call with <<<>>> This I still have to figured out but I have to do some code changes (in memory calls) until I can try this. But I let you know.

Best regards
Patrick

Don’t mix driver API and runtime API. Badness can and will result.

I realized that problem and so I rewrite my program to the driver api. But I’ve got another problem, which is in a new thread and can be found here:

[url=“http://forums.nvidia.com/index.php?showtopic=80099”]http://forums.nvidia.com/index.php?showtopic=80099[/url]

It would be nice If you can help me with this problem.

In addition: The context control should be integrated in the run-time api for further releases, I think.

Thanks in advance
Best regards
Patrick