Linux Kernel Paranoid Level = -1: OK

Hi, I’m trying to use nsight system on docker but I cannot get the CPU statistics.
I followed the guide to enable CPU sampling on docker by setting the paranoid level to 2 and giving the seccomp custom configuration file but it still doesn’t work.
From the nsight environment check I see that the “linux kernel paranoid Level” is set to -1.

My environment is:
Host

  • Ubuntu 22.04 LTS
  • Docker 20.10.16
  • Nvidia driver 510.73.05

Container:

  • Ubuntu 20.04 LTS
  • CUDA 11.4 (installed with .run file)
  • Nsight System 2021.2.4.12-a25c8fd

Nsight environment query on container:

nsys status -e
Timestamp counter supported: Yes
Sampling Environment Check
Linux Kernel Paranoid Level = -1: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-35-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
Sampling Environment: OK

Nvidia-smi on host

nvidia-smi 
Tue Jun  7 14:01:20 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     Off  | 00000000:01:00.0  On |                  N/A |
| 30%   43C    P8    17W / 125W |    546MiB /  8192MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2566      G   /usr/lib/xorg/Xorg                146MiB |
|    0   N/A  N/A      2701      G   ...ome-remote-desktop-daemon        2MiB |
|    0   N/A  N/A      2737      G   /usr/bin/gnome-shell              188MiB |
|    0   N/A  N/A      5236      G   ...6/usr/lib/firefox/firefox      157MiB |
|    0   N/A  N/A     18587      G   ...AAAAAAAAA= --shared-files       18MiB |
|    0   N/A  N/A     24574      G   ...ost-linux-x64/nsys-ui.bin       27MiB |
+-----------------------------------------------------------------------------+

perf_event_paranoid query on both host and device

cat /proc/sys/kernel/perf_event_paranoid
2

Seccomp file: seccomp_file.json (12.0 KB)

Hi @luigicrisci1997 ,

what is the issue you are facing? The CPU statistics do not show up in the report, or are you getting a warning message while profiling?

Could you share the complete command that you are using to launch nsys?

Hi @ztasoulas, thanks for your answer.
No I do not get the CPU statistics at all. The output is similar to the one in the attached picture.


You can see that all the information I get are some poll and ioctl.

To launch nsys I just use:

nsys profile <executable>

I would like to get an output similar to this:
report1.nsys-rep (1.8 MB). This comes from a simple vector addition application profiling on a Windows native machine.

Does system wide sampling provide the information you are looking for?

sudo nsys profile --sample=system-wide <app>

Nope, still not getting anything.
Also, I just tried on the local linux machine (the host machine) with cuda 11.7 and even there I don’t get any CPU statistics.

nsys status -e on host

Timestamp counter supported: Yes

Sampling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-37-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
Sampling Environment: OK

Generated report:
report3.nsys-rep (155.4 KB)

Is there something I am missing?

Can you try setting the paranoid level to a lower value, e.g., 1 or -1?
Also, I forgot to mention previously that system-wide sampling requires root privileges. Can you try adding sudo?

You can also check the Diagnostics summary section for warnings and errors, that can give clues to why the samples are not collected.

Thank you for the answer.
I just tried using sudo and with paranoid level = 1. It takes longer to profile but, while generating the report it crashes.

This is the output log:

sudo /usr/local/cuda/nsight-systems-2022.1.3/bin/nsys profile --sample=system-wide matrixMulCUBLAS  
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Turing" with compute capability 7.5

GPU Device 0: "Quadro RTX 4000" with compute capability 7.5

MatrixA(640,480), MatrixB(480,320), MatrixC(640,320)
Computing result using CUBLAS...done.
Performance= 2644.13 GFlop/s, Time= 0.074 msec, Size= 196608000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Generating '/tmp/nsys-report-c0f8.qdstrm'
[1/1] [========================100%] report7.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Common/Core/LimitedNumber.h(25): Throw in function static void QuadDCommon::LimitedNumberHelper::Checker<Compare>::Check(Base) [with Base = unsigned int; Base Limit = 16777215; bool Compare = true]\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::OutOfRangeException>\nstd::exception::what: OutOfRangeException\n[QuadDCommon::tag_message*] = Provided number 4294967295 is out of limit 16777215.\n"
      }
    }
  }
}


**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
Generated:
    /home/luigi/Downloads/cuda-samples-11.6/bin/x86_64/linux/release/report7.qdstrm
    /home/luigi/Downloads/cuda-samples-11.6/bin/x86_64/linux/release/report7.nsys-rep

but I think is related to some system processes more than to the CUDA application.

That’s the output report:
report7.nsys-rep (3.8 MB)
Anyway, still no info about CPU calls like in the windows profiling

Hi luigicrisci1997,

I believe the nsys crash you experienced has been fixed in a newer build of nsys. Can you upgrade to the latest version of nsys?

Also, looking at your screenshot, it looks like CPU sampling data was collected. When you ask for ‘CPU Statistics’, are you asking for the CPU sampling summary/histogram results? If so, make sure you are selecting the ‘Bottom-Up View’ in the drop down box below the timeline. See the attached screenshot.

Looking closer at this conversation. Sorry, I didn’t realize report1.nsys-rep was collected on a Windows system. It looks correct.

A paranoid level of 2 should work to profile your application and any processes it launches. You do not need to set the paranoid level any lower.

I also don’t think you need to use systemwide sampling unless you are trying to understand what else is running on the system and using system resources while your application runs.

Can you try launching the docker with the --privileged=true switch, install the latest version of nsys, and run the nsys status --environment command from within the docker? Please post those results here. Then, try your collection again.

Hi, sorry for the late answer.
Just tried to run the container with --privileged=true and installed the latest version of nsys.
The environment sampling looks correct now:

nsys status --environment
Timestamp counter supported: Yes

Sampling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-37-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
Sampling Environment: OK

Anyway, still no info about CPU op:
report1.nsys-rep (185.4 KB)

I’m also unable to get those info on the host machine without docker, so I suppose the error could be system-related

Your report1.nsys-rep file did include CPU Instruction Pointer samples. Check out the attached screenshot of that collection’s diagnostics. The red box is a warning indicating that kernel IP samples can’t be collected - i.e. IP samples of OS execution can’t be collected. This warning is not an error. The green box shows that 542 CPU IP samples were collected.

But, maybe you are asking about something else. What do you mean when you say no “CPU op” info was collected?

In the Windows report, there are additional information about what happens on the Threads:
This is on linux:


Here on Windows:

You see if the thread was blocked due to a user request etc.

That’s not such a big issue but I was just trying to understand why such differences appear

The thread state information comes from the OS. Linux and Windows provide different information and nsys utilizes what is available. So, these differences are expected.