I have a function to launch kernell calls in a multithreaded application (with ONE single GPU). This is the function:
void Launch_Test(unsigned char *img_u,int width, int height, float threshold){
unsigned char *devimg;
unsigned char *devimgCopy;
checkCudaErrors(cudaSetDevice(0));
size_t size=(1600 * 1200)*sizeof(unsigned char);
checkCudaErrors(cudaMalloc((void **)&devimg, size));
checkCudaErrors(cudaMalloc((void **)&devimgCopy, size));
checkCudaErrors(cudaMemcpy(devimg, img_u, size, cudaMemcpyHostToDevice));
checkCudaErrors(cudaMemcpy(devimgCopy, devimg, size, cudaMemcpyDeviceToDevice));
Test_kernel<<< dim3(100,75,1), dim3(16,16,1) >>>(1600, 1200, devimg, devimgCopy, threshold);
Test_Dilate<<< dim3(100,75,1), dim3(16,16,1) >>>(1600, 1200, devimg, devimgCopy);
checkCudaErrors(cudaDeviceSynchronize());
checkCudaErrors(cudaMemcpy(img_u, devimg,size, cudaMemcpyDeviceToHost));
checkCudaErrors(cudaFree(devimg));checkCudaErrors(cudaFree(devimgCopy));
}
It is used to process one 1600*1200 image and return the result after the two kernels operations.
When this function is called using one single CPU thread, it works fine.
When the function is called from two or more threads, the behaviour is unpredictable. It can fail when 2 or 6 or 7,… threads are used. The error is code=4(cudaErrorLaunchFailure) “cudaMalloc((void**)&devimg,size))”. It is possible to see in VS2012 that, when the program crashes, devimg is a null pointer. Making explicit inicialization of devimg inside the function (=new unsigned char[1600*1200;]) the result is the same.
Is this because while one thread is trying to access &devimg, other thread has cudaFree the pointer???
If this is the case, ¿How I should manage calls from multiple threads to this function?. Altought it seems that calling cudaSetDevice(0), the multithread calls should be managed, I have made test also using streams and cudaMemcpyAsync, and using context(API), but with no results. In some cases obtaining error in cudaMemcpy functions instead… I have also test with “–default-stream per-thread” command line option with the same result.
Visual profiler gives the error code=11(cudaErrorInvalidValue) "cudaMemcpy(devimg, img_u, size, cudaMemcpyHostToDevice). This error disappears when I define and use a hostimage pointer inside the function instead of img_u parameter
using GTX 970 and CUDA 7.0. I dont know what else to do…