In 2.2, cudaThreadExit() works fine only if you “cudaFree” your earlier cudaMalloc()s after a launch failure (launch timeout due to while(1)).
Otherwise, subsequent cudaMalloc() fails and the context is un-usable.
Can some1 explain what does this function actually do? and when should we use it?
Is it a cool way of releasing all your “cudaMalloc” in one-shot?
Does it really help in error-recovery (bad address, Launch timeout)?
Nothing more for me to say about the killer kernel thing at the moment, except that cudaThreadExit destroys a context and all its associated state. If that doesn’t fix it, file a bug (I don’t have time to track all of these things down if it requires me to hunt for additional hardware).
(it might help if I knew how things were failing after cudaThreadExit()–that’s sounds suspiciously like a pretty minimal bug in CUDART)
uh, it’s certainly in the reference manual. all it does it kill the context. further cuda* calls will reinitialize the context if they require a context to exist in the first place, you can call cudaSetDevice(n) after cudaThreadExit() without issue, etc.