Does CUDA7.5 suppose to work with Quadro M2000M?

I ran some test and failed on GPU timer. The elapsed time feedback an unreasonable value to me. The integrated driver within CUDA 7.5 is 353.90 which does not support Quadro M2000M (which seems started to be supported since 354.56) so I manually updated my GPU driver to latest 362.13. And pick customer installation for CUDA 7.5 to skip Graphics driver and GPU Deployment Kit. Is it correct way? And is there any known issue there regarding GPU’s timer?
My test is on windows 7 64bits OS.

Yes, your method for install sounds correct. Either keep the driver that was originally installed, or update to the latest. Then install CUDA while deselecting the driver install.

I have no idea what you are referring to by “GPU timer”. CUDA event based timing? clock() or clock64() based timing?

Windows WDDM can definitely interfere with getting sensible results from CUDA event based timing. Furthermore I have seen that CUDA event based timing can give strange results on Windows WDDM when you are timing host code (only - no CUDA calls).

Thanks for the reply txbob.
My timer was CUDA event based timing, through cudaEventElapsedTime() and it indeed happened on the host measurement. Expected elapsed time should be around 300ms while CUDA event based timing returned something like 0.00112ms. Exact same code looks ok on my colleague’s Linux machine (while it is running different graphics card). So it sounds consistent with your obseration? Thanks.

It’s consistent with what I’ve seen, and I don’t have a ready explanation for it. On linux, cudaEvent based timing seems to work fine even for host code. I think the same is true on Windows GPUs for device code. But for host-only code on windows, I’ve seen similar odd results. I suspect WDDM command queue batching is involved in the explanation but I can’t go further than that.

My suggestion would be to use an ordinary windows host-based timing function (e.g. queryperformancecounter) for timing host code on windows. If you can tolerate appropriate synchronization, then it can be safely used to time device code as well or mixed host/device sequences.

cudaEvent timing can give reasonable results for timing e.g. a kernel call in windows, but even then WDDM command queue batching can inject difficult to understand behavior into any but the most trivial timing sequences.

Thank you, txbob.