Profiling Tegra running a tensorflow application

Hello,

I am trying to profile a TensorFlow application that is being executed on the Tegra X1 GPU. I am using NVIDIA Visual Profiler on my Windows 10 host machine, but the data is being collected locally on a Ubuntu system following along with the information in this webpage: https://docs.nvidia.com/cuda/profiler-users-guide/index.html#collecting-remote-data

  1. I’d like to connect directly to the device via ssh through Visual Profiler, but my ssh connection requires public/private key authentication. I can’t find instructions anywhere online how to set this up. Any advice?
  2. In the attached picture, I am having trouble determining what the kernels (blue box) correspond to in my code? Is there information anywhere online on how Keras or TF functions get mapped to kernels? Also what is redzone_checker?
  3. Ultimately I need to understand how many SMs are being used and at what efficiencies. The green box in the attached picture seems to declare occupancy (warp utilization) but can I get more details than that? I am unsure how to translate these to streaming multiprocessors?

Thanks!
Jake

Hi Jake,

  1. Unfortunately Visual Profiler doesn’t support public/private key based authentication
  2. You can use NVTX based code annotation. This will help you to identify the kernels which get executed in a particular code range. Refer https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvtx
  3. Visual Profiler shows the SM utilization under guided/unguided analysis. Multiprocessor utilization analysis under the “Kernel Latency” section in the unguided analysis provides details of all the SMs.