Getting LaunchFailed error using 2023.2

Hi,

I’m using Nsight Compute to profile kernels in remote Linux targets from a Windows host. It’s really a great tool and helped me a lot in my work.

However, after upgrading to version 2023.2 from 2023.1, it stopped working and getting “LaunchFailed” error for any executables with active kernels.

I tried to empty the deployment directory before launching but it doesn’t help. With the limited error message, I don’t know what’s wrong.

Started SSH reverse proxy on port: 23350
Launching: /tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu
Process launched
==PROF== Attempting to connect to ncu-ui at 127.0.0.1:23350...

==PROF== Connected to ncu-ui at 127.0.0.1:23350.

==PROF== Connected to process 132042 (/path/to/executable)

==PROF== Connected to process 132042 (/path/to/executable)

==PROF== Profiling "permuteUint4": 
==PROF== Profiling "permuteUint4": 
0%....50%....100% - 43 passes


==ERROR== LaunchFailed

==ERROR== LaunchFailed

==PROF== Trying to shutdown target application

==PROF== Trying to shutdown target application

==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==PROF== Report: /tmp/var/gemm.ncu-rep

Launched application returned 9 (0x9).
Retrieving /tmp/var/gemm.ncu-rep to C:/Users/username/Documents/NVIDIA Nsight Compute/gemm.ncu-rep
Loading report file C:/Users/username/Documents/NVIDIA Nsight Compute/gemm.ncu-rep...

Note: I switched back to 2023.1 and it’s working without any problems, but I still want to try the new features of 2023.2.

1 Like

Strange. I wouldn’t expect the only change of an Nsight Compute version to cause this. Did you also update the CUDA toolkit version? Are you able to remotely login to the linux machine and run the application directly? It would be useful to see the output of just the application from the CLI followed by “ncu <application + arguments>” to try and filter out what could be causing the issue (remote tunnels etc…).

Thanks for your reply.

Running the NCU 2023.2 via CLI in the remote container is OK.

Any idea on how to debug the problem when running remotely on the Windows host?

Is it possible that some environment setup occurs when you login directly, with your .bashrc for example?

/tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu should still be on the remote machine, so can you login there and try launching the profile using that specific ncu binary?

All the remote connection does is copy those files over and launch them locally, so we need to determine what’s different between that and logging in directly. When logged in directly, can you do “ncu -version” and then “/tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu -version” to see if they match?

Sorry, I overlooked the error message when running ncu 2023.2 from the CLI as it prints some of the metrics correctly.

/tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu --version

shows

Version 2023.2.0.0 (build 32895467) (public-release)

There is the same error when running /tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu from remote machine. However default metrics (speed of light/launch statistics/occupancy) are printed correctly.

==PROF== Profiling "gemm": 0%....50%....100% - 10 passes

==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
[257823] test_gemm@127.0.0.1
  void gemm<...>(...) (96, 1, 1)x(256, 1, 1), Context 1, Stream 20, Device 0, CC 8.0
    Section: GPU Speed Of Light Throughput
    ----------------------- ------------- ------------
    Metric Name               Metric Unit Metric Value
    ----------------------- ------------- ------------
    DRAM Frequency          cycle/nsecond         1.56
    SM Frequency            cycle/nsecond         1.12
    Elapsed Cycles                  cycle       80,356
    Memory Throughput                   %        49.75
    DRAM Throughput                     %        18.79
    Duration                      usecond        71.42
    L1/TEX Cache Throughput             %        58.15
    L2 Cache Throughput                 %        17.27
    SM Active Cycles                cycle    68,634.57
    Compute (SM) Throughput             %        72.62
    ----------------------- ------------- ------------

......

This is interesting. It looks like it may not be related to the remote connection. What command line are you running on the remote machine? If you need to hide some details about the application and it’s parameters, that’s fine. But can you share the ncu flags? One thing to try on the remote machine is to collect a single metric with the CLI “/tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu --metrics smsp__inst_executed.sum <app and parameters>

If possible can you share the output of your application to make sure it’s not throwing any errors without Nsight Compute.

Also, can you share your drive version? You can find it with the “nvidia-smi” CLI command.

Thanks.