NSight Profiling Crashes with error code (9)

joseph.john · October 7, 2022, 2:22am

I am trying to profile an application that asynchronously launches CUDA kernels on the GPU. But the profiling fails with the following error

==PROF== Profiling “potrf_alg2_set_info” - 1: 0%
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using “–replay-mode application” to avoid memory save-and-restore.

==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using “–replay-mode application” to avoid memory save-and-restore.
…50%…100% - 73 passes
==PROF== Profiling “potrf_alg2_cta_upper” - 2: 0%…50%…100% - 71 passes

==ERROR== LaunchFailed

==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==PROF== Report: /home/mannaparambil/dplasma/build/profile.ncu-rep

Any suggestions as to what is happening?

dwoods · October 17, 2022, 8:36pm

Hello Joseph,
Thank you for your question on Nsight and I’m sorry you ran into this problem. I just want to clarify which Nsight product are you using. Are you using Nsight Graphics or a different Nsight product such as Nsight systems or Nsight Compute?
Regards,

agnieszka.lupinska · May 18, 2023, 2:00am

Hi, I am running Nsight Compute and I go the same error. What does the error code (9) mean?

The launch command:
ncu --set full --replay-mode range <binary>

$ ncu --version                                      
NVIDIA (R) Nsight Compute Command Line Profiler 
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.1.1.0 (build 32678585) (public-release)

dwoods · May 18, 2023, 2:54pm

Hello,
Since this question is about Nsight Compute I will move your topic to the Nsight Compute Forum for our engineering team to follow up.
Thanks,

jmarusarz · May 18, 2023, 8:42pm

Nsight Compute stores and restores kernel state in memory in order to replay the kernel multiple times. That can double the memory footprint. To avoid this you can switch to application replay with “–replay-mode application”. This avoids the memory storage from needing to replay. Let me know if that solves your issue.

d279617552 · September 8, 2023, 7:19am

i meet a similar error.

my app runs seems like OK alone, at least not showing any clearly error.
if i run my app with ncu --set full, it comes:

# ncu -f -o ktranspose --import-source on --set full  test_kernels --gtest_filter=design/test_transpose.time/0
Note: Google Test filter = design/test_transpose.time/0
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from design/test_transpose
[ RUN      ] design/test_transpose.time/0
==PROF== Connected to process 193819 (/home/dongwei/Workspace/lightnet/build/tests/test_kernels/test_kernels)
==PROF== Profiling "ktranspose" - 0: 0%....50%....100% - 34 passes
ktranspose: 430889 us
==PROF== Profiling "ktranspose_smem" - 1: 0%....50%....100% - 2 passes

==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==PROF== Report: /home/dongwei/Workspace/lightnet/ktranspose.ncu-rep

if i remove --set full, it gose well:

ncu -f -o ktranspose --import-source on  test_kernels --gtest_filter=design/test_transpose.time/0
Note: Google Test filter = design/test_transpose.time/0
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from design/test_transpose
[ RUN      ] design/test_transpose.time/0
==PROF== Connected to process 193540 (/home/dongwei/Workspace/lightnet/build/tests/test_kernels/test_kernels)
==PROF== Profiling "ktranspose" - 0: 0%....50%....100% - 9 passes
ktranspose: 147706 us
==PROF== Profiling "ktranspose_smem" - 1: 0%....50%....100% - 9 passes
ktranspose_smem: 49922 us
==PROF== Profiling "ktranspose_smem_nbkcft" - 2: 0%....50%....100% - 9 passes
ktranspose_smem_nbkcft: 57446 us
==PROF== Profiling "transpose_readWrite_alignment..." - 3: 0%....50%....100% - 9 passes
cublasSgeam: 55954 us
[       OK ] design/test_transpose.time/0 (3051 ms)
[----------] 1 test from design/test_transpose (3051 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (3051 ms total)
[  PASSED  ] 1 test.
==PROF== Disconnected from process 193540
==PROF== Report: /home/dongwei/Workspace/lightnet/ktranspose.ncu-rep

cause there are 4 kernel to profile, i run the crushed kernels with --set full one by one, they finished profile successfully.

i remove my kernel code line by line trying to find out is the fail caused by my abuse, i remove all code in my kernel, kelnel is exactly empty:

 template <typename T>
\_\_global\_\_ void mykernel(){}

crush still happens, until i change launch config
from <<<GRID, BLOCK, shared_mem, cudaStreamDefault>>>
to <<<GRID, BLOCK, 0, cudaStreamDefault>>>.
4 kernels finish profile in single ncu run.
so, in my sight, ncu will fail when profiling multiple kernel which use shared_mem in single app …

\# ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.2.2.0 (build 33188574) (public-release)

ziyue.zhang · September 20, 2023, 8:34am

I encountered similar issues, I figured out it was the RAM limitation in my system that caused this error code (9). I think by enabling --set full, the required peak RAM increases.

202476410arsmart · October 3, 2023, 1:32pm

Actually, I can use --set full, I noticed replay mode could be more important, below is my command:

ncu --set full --replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode relaxed -f --export output-file-full.nsight-cuprof-report ./a.out

I am using A100 for server, and “–app-replay-mode relaxed” could also help. “–app-replay-buffer file” could use file but not RAM as buffer.

FlyK · November 7, 2023, 11:26am

I have also encountered the issue of kernel replay being unable to profile in scenarios with large device memory usage. I had to resort to using application replay, but it is too slow and there is a possibility of mismatches with each replay. I hope that NCU can address this problem in the future and ensure that kernel replay can be properly profiled in scenarios with large memory usage.

Thanks.

veraj · November 17, 2023, 2:24am

Thanks for providing these inputs. We’re always trying to improve the stability and user experience of our tools and this type of input is very helpful. We recently released version 2023.3 with several bug fixes. Please try it out and let us know if the issue still occurs.

veraj · January 16, 2024, 10:33am

Close this topic as it is originally from year 2022. Please create a new topic if you have issues with Nsight Compute. We’ll try our best to help.

Topic		Replies	Views
==ERROR== Failed to prepare kernel for profiling (0xc00000fd) but CUDA sample works Nsight Compute kernel , nvbugs	13	2011	November 6, 2021
Nsight compute 2023.2: consistent Launch Fails for one of the kernels Nsight Compute	3	989	August 17, 2023
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1276	July 27, 2023
Crash when profiling with "Kernel Launches and Memory Operations" Nsight Visual Studio Edition	7	3620	February 5, 2015
Can't Get NCU GUI To Import Properly Nsight Compute	8	1297	October 5, 2020
Ncu does not detect kernels, ==ERROR== The application returned an error code (11) Nsight Compute kernel , profiling	6	1720	December 13, 2023
Option to profile only master process Nsight Compute cuda	23	3121	December 1, 2023
Nsight Compute Error Nsight Compute cuda	10	166	August 2, 2024
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1728	March 18, 2024
Error failed to profile kernel Nsight Compute cuda , nsight	3	759	May 18, 2023

NSight Profiling Crashes with error code (9)

Related topics