unexpected cuCtxAttach cuCtxDetach between different kernel executions in NSight profiling there are

Hi everybody,

Using NSight profiling on a timeline I saw repeated calls of cuCtxAttach cuCtxDetach.
There are about 20 pairs of attach detach sequential calls (see attached screen of NSight timeline) between different kernels execution.

Can anybody explain me what do they do?
Is it possible to avoid them to increase the calculation performance?

I’m using runtime api in my application.

Thanks in advance!

2 threads… 2 contexts… I can think of a few possibilities:

  1. you are using some library. It doesn’t seem like the behaviour of your own code. Because when you call cuCtxCreate(), and then call cuCtxDetach(), the context is directly destroyed and will cause error the next time you call cuCtxAttach()…
  2. it could be the driver’s own behaviour… you could verify this with a single context. And compare with the behaviour when you use driver API directly
  3. your card alows a single context at a time? which… is probably not a possibility, after all, because it would fail due to the reason mentioned in 1