Profiling using command line (nvprof) in Linux on Jetson TX1 and import into Visual Profiler (CUDA 7.5) in Windows

  1. Gathered the profile information on Jetson Tegra X1 in Linux machine based on the below comments

captured Timeline Info

nvprof –-output-profile

captured Info needed for Analysis system

nvprof –-analysis-metrics –o

We could able to generate both the timeline info ( and analysis info (

2: Now when I try to import these files ( & in visual profiler in windows machine to analyze, we could see just different counter values (like global load throughput, global store throughput and so on) but we could not able to analyze each kernel visually. It is throwing us the below error

“Insufficient Kernel Bounds Data”

Please guide me

Is there a reason you can’t run the profiler on the TX1 itself?

Also, is the Windows machine 64-bit?

Jetson Tx1 is connected to Linux machine.
There is no visual profiler (nvpp) in Linux machine.
We access this linux machine remotely for profiling (thro’ command line).

We gather profile information (using nvprof) and generate profile file (.prof) remotely in Linux machine.

And we are trying to analyze these generated profile files (.prof) visually using visual profiler (nvpp) in our local windows machine. [using Import option of nvpp]

We face this issue.

We use win 64 bit

And our main focus is to analyze each kernel visually
like what is the bottleneck ie memory, instruction, latency in a visual way

But when we import those profile files and try analyze each kernel visually to
find the bottleneck, we face this issue

Can’t we import the profile files generated which is generated in remote Linux
machine and analyze in nvpp in local windows machine


I try to reproduce your error but found all good in my side.

  1. generate tx1 profiler file on Linux machine
  2. open it with window machine

Only thing different is that I use CUDA toolkit 8.0’s profiler (latest version).
Could you give it a try to see if toolkit 8.0 will solve this issue?

Hi AastaLLL,

Thanks for your information.

I installed CUDA 8.0 profiler and tried importing the profile files.

It is still giving the same error “Insufficient Kernel Bounds Data”, when I try to analyze specific kernel visually.

One more additional information is, when I checked the Details, some metrics (like Global load/store throughput, Local load/store transactions and so on) are showing large values and looks like it is getting saturated


May I know how long your kernel ran?
Since profiler depends on hardware counters to capture event and metric data. It may lead to some problem if counters exceed their limits. Could you try to profile a shorter kernel code to check if this error is related to hardware counter?