problem with push after cuCtxAttach

I have a workflow where I call

cuCtxCreate(&ctx, CU_CTX_BLOCKING_SYNC, device)
cuCtxPopCurrent(NULL);

cuCtxPushCurrent(ctx);
cuCtxAttach(&ctx2, 0)
cuCtxPopCurrent(NULL);

cuCtxPushCurrent(ctx2); // <== fails

The last push operation on the context received from cuCtxAttach returns an invalid context error. If I remove the context attach call, everything is ok, so that is the culprit. This is with CUDA 3.2 and driver version 275.33 and a GeForce gts450 card under Windows 7. With a different machine (laptop) running CUDA 4.0 with a Quadro nvs 140m and same driver version 275.33 everything works fine. My guess is that it is the difference in CUDA version.

Any idea what I’m doing wrong (I’m guessing that I’m missing something in the specification of cuCtxAttach).

Thanks

The short version is that you do not need the context attach. The context-create implicitly gave the context a usage count of 1, which your reference to the context is maintaining across different CPU threads.

If you wrote a library or plugin that dealt with existing CUDA contexts, it would make sense for the library to call context-attach to increment the context’s usage count and prevent it from being destroyed out from under the library. It does not sound like that use case applies; but in any case, the change in CUDA 4.0 semantics suggests that context attach/detach is deservedly on the way to retirement.

It is a library that runs multiple instances in multiple threads, and I need to either let NVIDIA do the reference counting or do it myself. It seems that using a 3.2 CUDA compiler with newer 4.0 drivers brings out this bug due to the apparent change in status to cuCtxAttach (deprecating it …). Currently I’m using reference counting, but it seems that some other NVIDIA libraries I’m using do a cuCtxAttach behind the scene causing push/pop to stop functioning once they do.