NSight Profiling Crashes with error code (9)

I am trying to profile an application that asynchronously launches CUDA kernels on the GPU. But the profiling fails with the following error

==PROF== Profiling “potrf_alg2_set_info” - 1: 0%
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using “–replay-mode application” to avoid memory save-and-restore.

==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using “–replay-mode application” to avoid memory save-and-restore.
…50%…100% - 73 passes
==PROF== Profiling “potrf_alg2_cta_upper” - 2: 0%…50%…100% - 71 passes

==ERROR== LaunchFailed

==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==PROF== Report: /home/mannaparambil/dplasma/build/profile.ncu-rep

Any suggestions as to what is happening?

1 Like

Hello Joseph,
Thank you for your question on Nsight and I’m sorry you ran into this problem. I just want to clarify which Nsight product are you using. Are you using Nsight Graphics or a different Nsight product such as Nsight systems or Nsight Compute?
Regards,

1 Like

Hi, I am running Nsight Compute and I go the same error. What does the error code (9) mean?

The launch command:
ncu --set full --replay-mode range <binary>

$ ncu --version                                      
NVIDIA (R) Nsight Compute Command Line Profiler 
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.1.1.0 (build 32678585) (public-release)  
1 Like

Hello,
Since this question is about Nsight Compute I will move your topic to the Nsight Compute Forum for our engineering team to follow up.
Thanks,

1 Like

Nsight Compute stores and restores kernel state in memory in order to replay the kernel multiple times. That can double the memory footprint. To avoid this you can switch to application replay with “–replay-mode application”. This avoids the memory storage from needing to replay. Let me know if that solves your issue.

1 Like

i meet a similar error.

my app runs seems like OK alone, at least not showing any clearly error.
if i run my app with ncu --set full, it comes:

# ncu -f -o ktranspose --import-source on --set full  test_kernels --gtest_filter=design/test_transpose.time/0
Note: Google Test filter = design/test_transpose.time/0
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from design/test_transpose
[ RUN      ] design/test_transpose.time/0
==PROF== Connected to process 193819 (/home/dongwei/Workspace/lightnet/build/tests/test_kernels/test_kernels)
==PROF== Profiling "ktranspose" - 0: 0%....50%....100% - 34 passes
ktranspose: 430889 us
==PROF== Profiling "ktranspose_smem" - 1: 0%....50%....100% - 2 passes

==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==PROF== Report: /home/dongwei/Workspace/lightnet/ktranspose.ncu-rep

if i remove --set full, it gose well:

ncu -f -o ktranspose --import-source on  test_kernels --gtest_filter=design/test_transpose.time/0
Note: Google Test filter = design/test_transpose.time/0
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from design/test_transpose
[ RUN      ] design/test_transpose.time/0
==PROF== Connected to process 193540 (/home/dongwei/Workspace/lightnet/build/tests/test_kernels/test_kernels)
==PROF== Profiling "ktranspose" - 0: 0%....50%....100% - 9 passes
ktranspose: 147706 us
==PROF== Profiling "ktranspose_smem" - 1: 0%....50%....100% - 9 passes
ktranspose_smem: 49922 us
==PROF== Profiling "ktranspose_smem_nbkcft" - 2: 0%....50%....100% - 9 passes
ktranspose_smem_nbkcft: 57446 us
==PROF== Profiling "transpose_readWrite_alignment..." - 3: 0%....50%....100% - 9 passes
cublasSgeam: 55954 us
[       OK ] design/test_transpose.time/0 (3051 ms)
[----------] 1 test from design/test_transpose (3051 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (3051 ms total)
[  PASSED  ] 1 test.
==PROF== Disconnected from process 193540
==PROF== Report: /home/dongwei/Workspace/lightnet/ktranspose.ncu-rep

cause there are 4 kernel to profile, i run the crushed kernels with --set full one by one, they finished profile successfully.

i remove my kernel code line by line trying to find out is the fail caused by my abuse, i remove all code in my kernel, kelnel is exactly empty:

 template <typename T>
\_\_global\_\_ void mykernel(){}

crush still happens, until i change launch config
from <<<GRID, BLOCK, shared_mem, cudaStreamDefault>>>
to <<<GRID, BLOCK, 0, cudaStreamDefault>>>.
4 kernels finish profile in single ncu run.
so, in my sight, ncu will fail when profiling multiple kernel which use shared_mem in single app …

\# ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.2.2.0 (build 33188574) (public-release)
1 Like

I encountered similar issues, I figured out it was the RAM limitation in my system that caused this error code (9). I think by enabling --set full, the required peak RAM increases.

1 Like

Actually, I can use --set full, I noticed replay mode could be more important, below is my command:

ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode relaxed -f --export output-file-full.nsight-cuprof-report ./a.out

I am using A100 for server, and “–app-replay-mode relaxed” could also help. “–app-replay-buffer file” could use file but not RAM as buffer.

I have also encountered the issue of kernel replay being unable to profile in scenarios with large device memory usage. I had to resort to using application replay, but it is too slow and there is a possibility of mismatches with each replay. I hope that NCU can address this problem in the future and ensure that kernel replay can be properly profiled in scenarios with large memory usage.

Thanks.

Thanks for providing these inputs. We’re always trying to improve the stability and user experience of our tools and this type of input is very helpful. We recently released version 2023.3 with several bug fixes. Please try it out and let us know if the issue still occurs.

Close this topic as it is originally from year 2022. Please create a new topic if you have issues with Nsight Compute. We’ll try our best to help.