Remote profiling Jetson Nano not working from NVIDIA Visiual Profiler


I am trying to profile a CUDA code I wrote on the nano.

Running locally, all goes OK

sudo /usr/local/cuda/bin/nvprof ./vectorAdd

When trying from a remote machine

$ ssh root@ /usr/local/cuda/bin/nvprof --version
nvprof: NVIDIA (R) Cuda command line profiler
Copyright (c) 2012 - 2018 NVIDIA Corporation
Release version 10.0.326 (21)

$ ssh root@ /usr/local/cuda/bin/nvprof /home/david/NVIDIA_CUDA-10.0_Samples/0_Simple/vectorAdd/vectorAdd
==19722== NVPROF is profiling process 19722, command: /home/david/NVIDIA_CUDA-10.0_Samples/0_Simple/vectorAdd/vectorAdd
==19722== Warning: Unified Memory Profiling is not supported on the underlying platform. System requirements for unified memory can be found at:
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
==19722== Profiling application: /home/david/NVIDIA_CUDA-10.0_Samples/0_Simple/vectorAdd/vectorAdd
==19722== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   41.45%  43.919us         2  21.959us  21.673us  22.246us  [CUDA memcpy HtoD]
                   36.48%  38.657us         1  38.657us  38.657us  38.657us  vectorAdd(float const *, float const *, float*, int)

If I try to profile from NVIDIA Visual Profiler on macOS, version 10.1 fails to connect via SSH, version 10 reports a problem while trying to profile.

I add my remote connection:

  • Hostname:
  • UserName: root

I enter the path to the CUDA toolkit, the path to vectorAdd. Then when I try to launch the profiling, I get a popup saying “getting list of devices” then an error popup saying :

Data collection for 1 analysis stages failed
Category for event 19595265/108865 was not found

Is this a known bug of NVVP on macOS ? Is there a way to debug/solve it ?

Thanks in advance



Nano is using CUDA 10.0.
You will need the same CUDA version from JetPack installer or the libraries are not compatible.

And please noticed that we don’t officially support cross-compiling on macOS.
It’s recommended to have a Linux-based host environment for cross-compiling.


Hi AastaLLL,

thanks for your feedback.

I am using NVVP 10.0 on macOS so it is the same version as the Nano’s CUDA SDK/tools.

I am compiling directly on the Nano (not cross compiling). My question is about profiling NOT cross-compiling.

From my understanding, when remotely profiling with NVVP, it just executes some commands (nvprof) on the Nano using SSH and get back the generated files / text to present them graphically.

So there is a problem somewhere in the communication between NVVP and nvprof.



First, you will need to use both remote compiling and remote profiling.

Data collection for 1 analysis stages failed

This error may be related to some authority issue.
Would you mind to check the following comment first?



no, the problem is not related to root access. As I show in my original post, doing :

ssh root@ /usr/local/cuda/bin/nvprof /home/david/NVIDIA_CUDA-10.0_Samples/0_Simple/vectorAdd/vectorAdd

is working fine.

So root access is not the problem here.




It looks like your issue is similar to this topic:

Would you mind to check if the fix also works for you first.


no this answer is about using root to profile and ssh login … this is what I am already doing.



But it looks like some global variables setting is missing.
Would you mind to give it a try? Like this:

$ ssh root@your-target-ip /bin/sh -c "export LD_LIBRARY_PATH=\"/usr/local/cuda-10.0/lib64\":\${LD_LIBRARY_PATH};export NVPROF_TMPDIR=\"/tmp\";\"/usr/local/cuda-10.0/bin/nvprof\" --query-cuda-info; export LD_LIBRARY_PATH=\"/usr/local/cuda-10.0/lib64\":\${LD_LIBRARY_PATH};export NVPROF_TMPDIR=\"/tmp\";\"/usr/local/cuda-10.0/bin/nvprof\" --version"


Maybe I can hijack this topic since I have a question exactly in this direction?
My PC running Linux uses CUDA 9.1, which my production code needs and I can’t update it. The Nano has CUDA 10 and I managed to get both output files with “sudo nvprof -o OUT ./the_program” and with “sudo nv-nsight-cu-cli -o OUT ./the_program”.

The profiling with nsight takes much longer and generates a much larger file (like 24MB against 900K from nvprof), and then I tried opening these files in my PC. NVVP will not open the output from nvprof probably due to the difference in version (Nano is 10, PC is 9.1). So I installed NSight Compute 1.0 and tried opening the respective output with nv-nsight-cu. It opens, I can see a few things like kernels but there is no option to see a timeline or anything that allows for a proper visualization of the saved data. Saving in .csv doesn’t make any difference.

This is what I understood from the documentation, save the profiling on the host machine, visualize either in Windows or Linux. If I am making any mistake, let me know.