How to tell if CUDA code is optimized?

How can I reliably tell how tell my optimized my CUDA code for GPU computing is? I frequently have to come back to the CPU to do a (short/small) task, or to dump some data before looping back into the kernels. EVGA PrecisionX shows my GTX Titans at 90-100% GPU usage for the most part (with the very occasional spike down to 0% when data is being dumped to the hard drive).

(one GPU is idle, the other two are running)

As basic optimization, I don’t do any unnecessary memory copies, and I try to keep as much running on the GPU kernels as I possibly can without ever using the host.

Does 90-100% GPU usage mean I’m pretty close to being maxed out in terms of how much more I can get out of these Titans? Some of my stuff is now taking over a day to return a full set of results, so any time I can get back would be great for the production runs that will probably will be fine tuned for accuracy and take who knows how long.

Thanks.

Not necessarily. Just like on the CPU, I can have very inefficient (i.e. unoptimal) code that uses the processor completely. Optimization and utilization are related concepts, but they are not the same.