GPU Pro Tip: CUDA 7 Streams Simplify Concurrency

Hallo,

i tried to start the “Multi-threading Example” from https://devblogs.nvidia.com/parallelforall/gpu-pro-tip-cuda-7-streams-simplify-concurrency/ using a win7 system with visual studio 2012 and a nvidia titan X GPU. so i started to get “pthread” ready by following the instructions from http://web.cs.du.edu/~sturtevant/w13-sys/InstallingpthreadsforVisualStudio.pdf.

but setting the nvcc flag “–default-stream per-thread” as mentioned in https://devblogs.nvidia.com/parallelforall/gpu-pro-tip-cuda-7-streams-simplify-concurrency/ does not alter the profile obtained by NSIGHT. moreover no errors or warnings are raised.

so i added the lines

cudaError_t cudaStatus;
if (cudaStatus != cudaSuccess) {
printf(“ERROR, KERNEL FAILED: %s\n”, cudaGetErrorString(cudaStatus));
}

before the actual kernel launch and get the following error: “unrecognized error code”.

in case of launching the kernel without the use of “pthread”, no error is obtained.

does anyone know what is going on here?

Probably your pthreads on windows is not working correctly. I just re-ran the example on linux and it works as-is with no errors reported even with cuda-memcheck.

Also, you may be confused about error checking. This code:

cudaError_t cudaStatus;
if (cudaStatus != cudaSuccess) {
printf("ERROR, KERNEL FAILED: %s\n", cudaGetErrorString(cudaStatus));
}

makes no sense.

This line:

cudaError_t cudaStatus;

creates a variable called cudaStatus but does not set or initialize it to anything.

So when you test it here:

if (cudaStatus != cudaSuccess) {

You are testing an uninitialized variable that contains junk. When you take that junk value and try to convert it to an error string:

cudaGetErrorString(cudaStatus)

You get “unrecognized error code”. That make sense to me.