Bug in gpuGetMaxGflopsDeviceId for CUDA Toolkit 10.0

The function gpuGetMaxGflopsDeviceId in the CUDA Toolkit 10.0 helper_cuda.h has a problem.

It does NOT pick the fastest device like the CUDA Toolkit 9.2 does.

Tracing through this in Toolkit 10.0 shows the following:

inline int gpuGetMaxGflopsDeviceId()

DEVICE 0
deviceProp = {name=0x00000089f117ede0 “Quadro P2000” uuid={bytes=0x00000089f117eee0 ":Ä}\x1d¥þ¶vÌÀÚÿ—Õõ… } luid=…}
compute_perf = 1516032000
sm_per_multiproc = 128
deviceProp.multiProcessorCount = 8
deviceProp.clockRate = 1480500

DEVICE 1
deviceProp = {name=0x00000089f117ede0 “GeForce GTX 690” uuid={bytes=0x00000089f117eee0 "új¤\x5\\tpÕˆgˆqÊk-\x6… } …}
compute_perf = 1565952000
sm_per_multiproc = 192
deviceProp.multiProcessorCount = 8
deviceProp.clockRate = 1019500

DEVICE 2
deviceProp = {name=0x000000fb4dcfeb50 “GeForce GTX 690” uuid={bytes=0x000000fb4dcfec50 "ráS­vœXƒ\x1c"ÇHCd2Î… } …}
compute_perf = 1565952000
sm_per_multiproc = 192
deviceProp.multiProcessorCount = 8
deviceProp.clockRate = 1019500

max_perf_device = 1

In CUDA Toolkit 9.2 it picked Device 0 which is correct.
In CUDA Toolkit 10.0 it picked Device 1 which is incorrect. Firstly, Device 1 is NOT in TCC mode and Device 0 is. Secondly, the compute performance for a GeForce GTX 690 is NOT the same as a Quadro P2000.

As a result, by default the samples will run on the wrong GPU.

The functions in helper_cuda.h are more for convenience and demonstration of sample code than for serious use in production software. Since they are not part of the API, there is a chance they can be modified without further notice, so consistency is not guaranteed.
Check Njuffa’s reply here, exactly the same issue 4 years ago:
https://devtalk.nvidia.com/default/topic/790725/function-library/