I am noticing some funny behaviour when two processes using CUDA are running in parallel on the same GPU card (a K40 using CUDA 7). I have written a small CUDA application and its equivalent CPU application. I have validated that the CPU and GPU versions are one-to-one as they both give the same results. However I noticed that when someone else is running a matlab based application (from a different account) that invokes some cuda routines then I am getting completely different behavior in my cuda application. Furthermore, every time I run my application I seem to be getting different results (my application is deterministic so it should give the same results each time). Why could this be happening? Could the other application (or my application) be making an illegal memory access? I am using 5 streams in my application and I am making use of pinned memory, the cuBlas library, as well as some kernels I have written.
After each cuBlas/kernel invocation and when the other application is not running, I have tried invoking cudaDeviceSynchronize() and cudaStreamSynchronize for all 5 streams, and I notice that I get different (ie: incorrect) results (even though the results are still deterministic and do not change each time i invoke my application). Why could this be happening?