Nsight compute 2023.2: consistent Launch Fails for one of the kernels

I’m getting the error on compute 2023.2.1, driver 535, ubuntu 22.04, on a 1650 super gpu. I have managed to successfully profile the uncoalescedGlobalAccesses in extras/samples.

My log shows:
$ sudo ncu -f --set full --import-source on -o cudart.ncu-rep ./cudart
Rendering a 600x400 image with 10 samples per pixel in 8x8 blocks.
==PROF== Connected to process 12250 (/home/shihab/codes/raytracinginoneweekendincuda/cudart)
==PROF== Profiling “rand_init” - 0: 0%…50%…100% - 31 passes
==PROF== Profiling “create_world” - 1: 0%…50%…100% - 31 passes
==PROF== Profiling “render_init” - 2: 0%…50%…100% - 31 passes
==PROF== Profiling “render” - 3: err:50 | cudart| profiler| >> Sending profiler error message: Launching the workload is taking more time than expected. If this continues to hang, terminate the profile and re-try by profiling the range of all related launches using ‘–replay-mode range’. See Kernel Profiling Guide :: Nsight Compute Documentation for more details.
0%.
==WARNING== Launching the workload is taking more time than expected. If this continues to hang, terminate the profile and re-try by profiling the range of all related launches using ‘–replay-mode range’. See Kernel Profiling Guide :: Nsight Compute Documentation for more details.
err:50 | cudart| cuda| >> Failed to synchronize ctx 0x558fc96e8c20 (error = 702)
err:50 | cudart| profiler| >> Client launch function failed: UnknownError
err:50 | cudart|profiler_data_collector| >> Failed to synchronize stream (error = 702)
err:50 | cudart| profiler| >> Failed ending pass: UnknownError
err:50 | cudart| profiler_experiment| >> Failed launching for SW Counters::1
err:50 | cudart| profiler_experiment| >> Skipping OnEnd due to previous traversal error
err:50 | cudart| cuda_context_state| >> Async context error while copying!
err:50 | cudart| cuda_context_state| >> Failed to transfer context state!
err:50 | cudart| cuda_replay| >> Failed to restore
err:50 | cudart| profiler| >> Client pre iteration failed (UnknownError)
err:50 | cudart| profiler_experiment| >> Failed launching for LOP Counters::0
err:50 | cudart| profiler_experiment| >> Skipping OnEnd due to previous traversal error
err:50 | cudart| profiler| >> Profile end failed (LaunchFailed)
err:20 | cudart| cuda| >> ProfileSeries returned an error: LaunchFailed
err:50 | cudart| cuda| >> executeInternal returned an error: LaunchFailed
err:50 | cudart| profiler| >> Sending profiler error message: LaunchFailed
…50%…100% - 2 passes

==ERROR== LaunchFailed
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No source files were imported. Check that the target application was compiled with -lineinfo.
==PROF== Report: /home/shihab/codes/raytracinginoneweekendincuda/cudart.ncu-rep

What is the application doing? Is it something that you could share or share a reproducer? Sometimes we see this if there are multiple kernels that require synchronization because Nsight Compute serializes kernels. Are you using more than one stream to submit work or using anything like nccl, etc…?

Hi, this a fork of popular ray tracing code.

Here’s the link to the fork: GitHub - Shihab-Shahriar/raytracinginoneweekendincuda at original

Thanks. We can try and reproduce this in-house. In the meantime, can you try using the “–replay-mode application” flag to see if that fixes the hang? There are various reasons this could happen, for example CPU to GPU communication that fails when the kernel is replayed multiple times. Let me know what you find.

1 Like