How can I reliably tell how tell my optimized my CUDA code for GPU computing is? I frequently have to come back to the CPU to do a (short/small) task, or to dump some data before looping back into the kernels. EVGA PrecisionX shows my GTX Titans at 90-100% GPU usage for the most part (with the very occasional spike down to 0% when data is being dumped to the hard drive).
(one GPU is idle, the other two are running)
As basic optimization, I don’t do any unnecessary memory copies, and I try to keep as much running on the GPU kernels as I possibly can without ever using the host.
Does 90-100% GPU usage mean I’m pretty close to being maxed out in terms of how much more I can get out of these Titans? Some of my stuff is now taking over a day to return a full set of results, so any time I can get back would be great for the production runs that will probably will be fine tuned for accuracy and take who knows how long.
Thanks.