Nsight compute 2023.2: consistent Launch Fails for one of the kernels

Shihab_Khan · August 14, 2023, 6:05pm

I’m getting the error on compute 2023.2.1, driver 535, ubuntu 22.04, on a 1650 super gpu. I have managed to successfully profile the uncoalescedGlobalAccesses in extras/samples.

My log shows:
$ sudo ncu -f --set full --import-source on -o cudart.ncu-rep ./cudart
Rendering a 600x400 image with 10 samples per pixel in 8x8 blocks.
==PROF== Connected to process 12250 (/home/shihab/codes/raytracinginoneweekendincuda/cudart)
==PROF== Profiling “rand_init” - 0: 0%…50%…100% - 31 passes
==PROF== Profiling “create_world” - 1: 0%…50%…100% - 31 passes
==PROF== Profiling “render_init” - 2: 0%…50%…100% - 31 passes
==PROF== Profiling “render” - 3: err:50 | cudart| profiler| >> Sending profiler error message: Launching the workload is taking more time than expected. If this continues to hang, terminate the profile and re-try by profiling the range of all related launches using ‘–replay-mode range’. See Kernel Profiling Guide :: Nsight Compute Documentation for more details.
0%.
==WARNING== Launching the workload is taking more time than expected. If this continues to hang, terminate the profile and re-try by profiling the range of all related launches using ‘–replay-mode range’. See Kernel Profiling Guide :: Nsight Compute Documentation for more details.
err:50 | cudart| cuda| >> Failed to synchronize ctx 0x558fc96e8c20 (error = 702)
err:50 | cudart| profiler| >> Client launch function failed: UnknownError
err:50 | cudart|profiler_data_collector| >> Failed to synchronize stream (error = 702)
err:50 | cudart| profiler| >> Failed ending pass: UnknownError
err:50 | cudart| profiler_experiment| >> Failed launching for SW Counters::1
err:50 | cudart| profiler_experiment| >> Skipping OnEnd due to previous traversal error
err:50 | cudart| cuda_context_state| >> Async context error while copying!
err:50 | cudart| cuda_context_state| >> Failed to transfer context state!
err:50 | cudart| cuda_replay| >> Failed to restore
err:50 | cudart| profiler| >> Client pre iteration failed (UnknownError)
err:50 | cudart| profiler_experiment| >> Failed launching for LOP Counters::0
err:50 | cudart| profiler_experiment| >> Skipping OnEnd due to previous traversal error
err:50 | cudart| profiler| >> Profile end failed (LaunchFailed)
err:20 | cudart| cuda| >> ProfileSeries returned an error: LaunchFailed
err:50 | cudart| cuda| >> executeInternal returned an error: LaunchFailed
err:50 | cudart| profiler| >> Sending profiler error message: LaunchFailed
…50%…100% - 2 passes

==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No source files were imported. Check that the target application was compiled with -lineinfo.
==PROF== Report: /home/shihab/codes/raytracinginoneweekendincuda/cudart.ncu-rep

jmarusarz · August 14, 2023, 6:49pm

What is the application doing? Is it something that you could share or share a reproducer? Sometimes we see this if there are multiple kernels that require synchronization because Nsight Compute serializes kernels. Are you using more than one stream to submit work or using anything like nccl, etc…?

Shihab_Khan · August 14, 2023, 8:30pm

Hi, this a fork of popular ray tracing code.

Here’s the link to the fork: GitHub - Shihab-Shahriar/raytracinginoneweekendincuda at original

jmarusarz · August 17, 2023, 8:24pm

Thanks. We can try and reproduce this in-house. In the meantime, can you try using the “–replay-mode application” flag to see if that fixes the hang? There are various reasons this could happen, for example CPU to GPU communication that fails when the kernel is replayed multiple times. Let me know what you find.

Topic		Replies	Views
LaunchFailed when using Nsight Compute 2023.2 Nsight Compute	4	1289	August 17, 2023
NSight Profiling Crashes with error code (9) Nsight Compute	11	4514	January 16, 2024
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1457	February 13, 2023
LaunchFailed on windows, nsignt compute 2023.3 Nsight Compute	18	1035	January 29, 2024
Nsight compute hanging issue Nsight Compute kernel	7	851	March 11, 2024
Getting LaunchFailed error using 2023.2 Nsight Compute	5	1537	August 10, 2023
Nsight Compute: specific kernel launch failure Nsight Compute cuda , kernel	4	488	February 2, 2024
Nishgt-Compute ==ERROR== LaunchFailed Nsight Compute cuda , kernel	3	845	November 15, 2023
Ncu-ui not profiling some sections Nsight Compute	4	2364	November 26, 2020
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1912	March 18, 2024

Nsight compute 2023.2: consistent Launch Fails for one of the kernels

Related topics