Monitoring bottlenecks with Windows 10 FCU Task Manager

First off, I’m training a deep neural network using Microsoft’s CNTK (using CUDA underneath) on a GTX 1060 with 3GB of memory.

Has anyone attempted to monitor performance with the new Task Manager in the Fall Creators Update?

It breaks out GPU usage by 3D, Copy, Encode/Decode, and memory usage. I’m fairly certain of the accuracy of the memory usage, but what about of the processing units?

Here’s a grab of the window:
[url]https://imgur.com/gallery/LJZJo[/url]

From the window, it’s obvious that single-core performance is a limiting factor, even though it’s a Kaby Lake processor. What really surprised me is that increasing the size of the training dataset didn’t really affect anything in this window except the memory usage. I would have expected greater usage in the 3D portion, or at the very least the Copy portion.

Does anyone here profile usage during training to identify bottlenecks, or do you just throw the fastest hardware you have at it and watch the wall clock?