do the cuda can use all the capability of the gpu?

I am a university student and my project is using the cuda to research the GPU.
Last week, I finished my project , but today my tutor send a email to me, he doubt the cuda can not use all of the capability of gpu

the letter is following:
We worked with a machine whose Nvidia card is always in optimisation mode, even if the Intel card is turned off. Depending on how much it is stressed it keeps to a min energy strategy and changes its operating modes dynamically at runtime - there are multiple power states between sleep and full speed for that card and the user has very little influence on which of these modes the card chooses. Even during the playing of games this can be noticed.

He doubt the cuda can not use the full speed of GPU, but I disagree to that.
Could you give me some idea to prove that, thank you very much

If you use nvidia-smi to monitor the GPU power state continously, I think you will find that while a CUDA application is actively using the GPU, the GPU is always in “Performance State” P0 (full performance). If the GPU is not fully utilized, you will most likely observe states P2 or P8 (these are power-saving states), I am not sure how many states there are altogether.

Now, if you have a modern GPU, within power state P0 the card may run with different clocks rates, depending on whether “Auto Boost” is set to “On” and the current thermal and electrical load of the GPU. nvidia-smi lists the current clocks under “Clocks” and the maximum clocks under “Max Clocks”. If you don’t see the GPU boosting clocks to the max allowed while running your CUDA application, check whether “Enforced Power Limit” is equal to “Max Power Limit” and if necessary, increase the enforced power limit. As for thermal limitations, check whether “GPU Current Temp” is at, or close to, “GPU Slowdown Temp” and if so, improve the cooling.

For my programs the most I was able to get was half power of the maximum, based on the nvidia-smi. One of the programs was doing fast Fourier transforms, while the other one I tried over-damped molecular dynamics. Gromacs seems to use less than half on the 1080 GTX.

Typically neither compute-bound nor memory-bound tasks (like FFT) are going to maximize the GPU power. You need the “optimal” mix of compute and memory activity, which rarely happens with real-life applications. In general, that’s a good thing, because burning power is not a main objective of high-performance computing. Getting as much performance out of as little power as possible is.

That said, at least for Maxwell-class GPUs, some kernels used by Folding@Home can get close to the maximum power consumption specified for the card (like 37W out of 40W, while power limit may be configured as low as 30W).

Of course it was always worth it. The FFT on gpu was about 50-70 times faster than a single core, while for the MD we ran 20 different replicas of the same system, effectively collecting about 80 times more data after equilibration.
I think the games are using more in general than CUDA programs.