using cudaMalloc, cudaMemcpy, and cudaFree in a multi threaded program


I am a beginner at CUDA. I am trying to use a static library(nvcompress) to compress some textures in my multi-threaded program. This library uses CUDA acceleration for compression. However, because the program that calls this library uses different threads, the program crashes in the library while doing cudamemcpy or cudafree(access violation). If I put a mutex around where it calls the library functions it works fine, but I would lose a lot of time which is unwanted.

Is there a way to make cudaMalloc, cudaMemcpy, cudaFree thread safe? Are there any alternatives?

Thanks for your help :)