Ncu-ui not profiling some sections

geohei · November 16, 2020, 7:25pm

Hi.

I’m newbie in CUDA.

I start Nsight Compute as root using ncu-ui.
My code has 2 kernels.
The first one profiles correctly for all (!) sections, but the second one doesn’t profile for the following sections:

Instructions Statistics
Occupancy
Source Counters

With any of the above sections selected, I get:
The profiler returned an error code: 1 (0x1)

The first errors in the report are:

[Error] Rule Bottleneck returned an error:
Metric launch__waves_per_multiprocessor not found

[Error] <built-in function IAction_metric_by_name> returned a result with an error set
/opt/nvidia/nsight-compute/2020.2.1/target/linux-desktop-glibc_2_11_3-x64/../../sections/SpeedOfLight.py:45
/opt/nvidia/nsight-compute/2020.2.1/target/linux-desktop-glibc_2_11_3-x64/../../sections/NvRules.py:365

[Error] Rule Roofline Analysis returned an error: Metric
sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained not found

Is there something with my code prevents these section to profile?

Thanks,

Sanjiv.Satoor · November 17, 2020, 11:16am

Can you please provide details on your setup?

Nsight Compute version
which GPU
CUDA driver version
OS version

geohei · November 24, 2020, 1:44pm

Sorry for late reply.

Nsight Compute 2020.2.1 (build 29181059)
RTX 2080 Super
Build cuda_11.1.TC455_06.29190527_0
Ubuntu 20.04.1

… and correction on posting #1. Section Occupancy works fine, so only Instructions Statistics and Source Counters produce error code 1.

felix_dt · November 25, 2020, 7:26am

Unfortunately, it’s not clear from your description why this wouldn’t work. If you can share your code with us for testing, we can debug the issue internally. Otherwise, please try the following steps:

Profile the application with the ncu command line interface with one of the offending sections, e.g.

ncu --section InstructionStats -s 1 -c 1 my-app

to profile only the second kernel with this section.

If this also fails, you can try with the individual metrics listed in this section file. The file can be found in your ncu installation path in the “sections” directory. Select one or more of the metric names from this file, and collect them via the command line

ncu --metrics smsp__inst_executed.sum,inst_executed -s 1 -c 1 my-app
If you identified one specific metric that causes problems, you can remove them from the section file as a WAR to get unblocked.
As an alternative, try collecting the data with application replay

ncu --replay-mode application -s 1 -c 1 my-app
Are there any specific properties of this kernels, that could cause issues? Does it run especially long? Have you checked the kernel with e.g. compute-sanitizer for correctness? Compute Sanitizer User Manual :: Compute Sanitizer Documentation

geohei · November 26, 2020, 3:30pm

Hi.

Thanks a lot for the extensive reply.

I saw the LaunchFailed only now (see below). I have to address this one.

Sorry, but not allowed to post any code. I know … it would make it easier for me as well. I will work myself through all the points of your post above. Very useful!

# /usr/local/cuda-11.1/bin/ncu --section InstructionStats -s 1 -c 1 test_cuda
==PROF== Profiling "kernel_2" - 1 of 1: 0%....50%....100% - 2 passes
==ERROR== Error: LaunchFailed
...
==PROF== Disconnected from process 12114
==ERROR== An error occurred while trying to profile.
[12114] test_cuda@127.0.0.1
...
---------------------------------------------------------------------- --------------- ------------------------------
Avg. Executed Instructions Per Scheduler                                                                      (!) n/a
Executed Instructions                                                                                         (!) n/a
Avg. Issued Instructions Per Scheduler                                                                        (!) n/a
Issued Instructions                                                                                           (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------

Is it possible that this is due to a coding error?

# /usr/local/cuda-11.1/bin/compute-sanitizer test_cuda
========= COMPUTE-SANITIZER
...
========= ERROR SUMMARY: 0 errors

Regarding LaunchFailed …
http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/online/group__CUDART__TYPES_g3f51e3575c2178246db0a94a430e0038.html

cudaErrorLaunchFailure An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. The device cannot be used until cudaThreadExit() is called. All existing device memory allocations are invalid and must be reconstructed if the program is to continue using CUDA.

Fishing in the dark … :(

… later …

I have more info.

It seems that all turns around these values.

dim3 blocks  (9*1024,1,1);
dim3 threads (1024);

If I change the block to 8*1024, it always works.
If I change the block to 10*1024, it always fails.
9*1024 sometime works, sometimes fails.

This must be somehow an indication of a very specific problem.
I checked the code up and down … didn’t find anything wrong.
This must be CUDA related.

A passed run looks like this:

# /usr/local/cuda-11.1/bin/ncu --section InstructionStats -s 1 -c 1 test_cuda
...
==PROF== Profiling "kernel_test" - 1 of 1: 0%....50%....100% - 3 passes
...
==PROF== Disconnected from process 34479
[34479] test_cuda@127.0.0.1
kernel_test(curandStateXORWOW*, int, res_struct*, int*, unsigned long long*), 2020-Nov-27 15:33:33, Context 1, Stream 7
Section: Instruction Statistics
---------------------------------------------------------------------- --------------- ------------------------------
Avg. Executed Instructions Per Scheduler                                          inst                     80.148.480
Executed Instructions                                                             inst                 15.388.508.160
Avg. Issued Instructions Per Scheduler                                            inst                  80.148.553,89
Issued Instructions                                                               inst                 15.388.522.347
---------------------------------------------------------------------- --------------- ------------------------------

Topic		Replies	Views
Can't Get NCU GUI To Import Properly Nsight Compute	8	1437	October 5, 2020
Nsight Compute not detecting kernel launch Nsight Compute profiling	13	3204	May 6, 2021
Cannot profile CUDA kernel using NC : Run Bottleneck returned an error Nsight Compute	4	560	October 12, 2021
nsight compute ui and cli can't profiling any cuda application Nsight Compute	6	3902	August 21, 2019
Failed to profile kernel CUDA Programming and Performance	1	906	October 16, 2020
NSIGHT COMPUTE not working on simple CUDA example Nsight Compute	1	936	February 7, 2022
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1539	February 13, 2023
Nvidia Compute profile with no results Nsight Compute cuda , nsight	1	906	April 27, 2020
Running ncu with SourceCounters returns an error: Metric smsp__pcsamp_sample_count not found Nsight Compute	5	711	August 7, 2023
==ERROR== Failed to prepare kernel for profiling (0xc00000fd) but CUDA sample works Nsight Compute kernel , nvbugs	13	2146	November 6, 2021

Ncu-ui not profiling some sections

Related topics