I was wondering all the time why there is an implicit binding to the current cuda context.
Since we are doing a lot of parallel computing also in our native cuda, it makes our handling of different cuda context very difficult. What would help in my opinion - and I don’t know if this is doable - is that every method gets a handle to the context. In that way I don’t need to set the current context and all my memory copies and executes can be done with this handle. I could even have a chance to copy memory from one context to the other.
Ah, yes and maybe I can even get rid of the peer context and make this also more transparent.
What do you think?