Is cudaThreadExit necessary?

I’ve implemented some convolution functions that I call from Matlab through a mex-file. The filtering takes about 3 ms but the cudaThreadExit call takes 20 ms to perform which removes much of the GPU performance gain. Is it really necessary to call cudaThreadExit ? I’ve removed it and it seems to work well, can I get some memory leaks if I don’t call it? (yes I free the memory that I allocate). I’ve never really understood what it does.

it kills the GPU context. there are pros and cons to this (you have to call it before you call cudaSetDevice if a context already exists, for example), but it’s not strictly necessary.