I am trying to profile a TensorFlow application that is being executed on the Tegra X1 GPU. I am using NVIDIA Visual Profiler on my Windows 10 host machine, but the data is being collected locally on a Ubuntu system following along with the information in this webpage: https://docs.nvidia.com/cuda/profiler-users-guide/index.html#collecting-remote-data
- I’d like to connect directly to the device via ssh through Visual Profiler, but my ssh connection requires public/private key authentication. I can’t find instructions anywhere online how to set this up. Any advice?
- In the attached picture, I am having trouble determining what the kernels (blue box) correspond to in my code? Is there information anywhere online on how Keras or TF functions get mapped to kernels? Also what is redzone_checker?
- Ultimately I need to understand how many SMs are being used and at what efficiencies. The green box in the attached picture seems to declare occupancy (warp utilization) but can I get more details than that? I am unsure how to translate these to streaming multiprocessors?