Linux Kernel Paranoid Level = -1: OK

luigicrisci1997 · June 7, 2022, 12:07pm

Hi, I’m trying to use nsight system on docker but I cannot get the CPU statistics.
I followed the guide to enable CPU sampling on docker by setting the paranoid level to 2 and giving the seccomp custom configuration file but it still doesn’t work.
From the nsight environment check I see that the “linux kernel paranoid Level” is set to -1.

My environment is:
Host

Ubuntu 22.04 LTS
Docker 20.10.16
Nvidia driver 510.73.05

Container:

Ubuntu 20.04 LTS
CUDA 11.4 (installed with .run file)
Nsight System 2021.2.4.12-a25c8fd

Nsight environment query on container:

nsys status -e
Timestamp counter supported: Yes
Sampling Environment Check
Linux Kernel Paranoid Level = -1: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-35-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
Sampling Environment: OK

Nvidia-smi on host

nvidia-smi 
Tue Jun  7 14:01:20 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.05    Driver Version: 510.73.05    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     Off  | 00000000:01:00.0  On |                  N/A |
| 30%   43C    P8    17W / 125W |    546MiB /  8192MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2566      G   /usr/lib/xorg/Xorg                146MiB |
|    0   N/A  N/A      2701      G   ...ome-remote-desktop-daemon        2MiB |
|    0   N/A  N/A      2737      G   /usr/bin/gnome-shell              188MiB |
|    0   N/A  N/A      5236      G   ...6/usr/lib/firefox/firefox      157MiB |
|    0   N/A  N/A     18587      G   ...AAAAAAAAA= --shared-files       18MiB |
|    0   N/A  N/A     24574      G   ...ost-linux-x64/nsys-ui.bin       27MiB |
+-----------------------------------------------------------------------------+

perf_event_paranoid query on both host and device

cat /proc/sys/kernel/perf_event_paranoid
2

Seccomp file: seccomp_file.json (12.0 KB)

ztasoulas · June 7, 2022, 3:17pm

Hi @luigicrisci1997 ,

what is the issue you are facing? The CPU statistics do not show up in the report, or are you getting a warning message while profiling?

Could you share the complete command that you are using to launch nsys?

luigicrisci1997 · June 8, 2022, 10:41am

Hi @ztasoulas, thanks for your answer.
No I do not get the CPU statistics at all. The output is similar to the one in the attached picture.

You can see that all the information I get are some poll and ioctl.

To launch nsys I just use:

nsys profile <executable>

I would like to get an output similar to this:
report1.nsys-rep (1.8 MB). This comes from a simple vector addition application profiling on a Windows native machine.

ztasoulas · June 9, 2022, 2:13am

Does system wide sampling provide the information you are looking for?

sudo nsys profile --sample=system-wide <app>

luigicrisci1997 · June 9, 2022, 8:09am

Nope, still not getting anything.
Also, I just tried on the local linux machine (the host machine) with cuda 11.7 and even there I don’t get any CPU statistics.

nsys status -e on host

Timestamp counter supported: Yes

Sampling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-37-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
Sampling Environment: OK

Generated report:
report3.nsys-rep (155.4 KB)

Is there something I am missing?

ztasoulas · June 13, 2022, 4:27pm

Can you try setting the paranoid level to a lower value, e.g., 1 or -1?
Also, I forgot to mention previously that system-wide sampling requires root privileges. Can you try adding sudo?

You can also check the Diagnostics summary section for warnings and errors, that can give clues to why the samples are not collected.

luigicrisci1997 · June 14, 2022, 7:29am

Thank you for the answer.
I just tried using sudo and with paranoid level = 1. It takes longer to profile but, while generating the report it crashes.

This is the output log:

sudo /usr/local/cuda/nsight-systems-2022.1.3/bin/nsys profile --sample=system-wide matrixMulCUBLAS  
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Turing" with compute capability 7.5

GPU Device 0: "Quadro RTX 4000" with compute capability 7.5

MatrixA(640,480), MatrixB(480,320), MatrixC(640,320)
Computing result using CUBLAS...done.
Performance= 2644.13 GFlop/s, Time= 0.074 msec, Size= 196608000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Generating '/tmp/nsys-report-c0f8.qdstrm'
[1/1] [========================100%] report7.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Common/Core/LimitedNumber.h(25): Throw in function static void QuadDCommon::LimitedNumberHelper::Checker<Compare>::Check(Base) [with Base = unsigned int; Base Limit = 16777215; bool Compare = true]\nDynamic exception type: boost::exception_detail::clone_impl<QuadDCommon::OutOfRangeException>\nstd::exception::what: OutOfRangeException\n[QuadDCommon::tag_message*] = Provided number 4294967295 is out of limit 16777215.\n"
      }
    }
  }
}


**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
Generated:
    /home/luigi/Downloads/cuda-samples-11.6/bin/x86_64/linux/release/report7.qdstrm
    /home/luigi/Downloads/cuda-samples-11.6/bin/x86_64/linux/release/report7.nsys-rep

but I think is related to some system processes more than to the CUDA application.

That’s the output report:
report7.nsys-rep (3.8 MB)
Anyway, still no info about CPU calls like in the windows profiling

rknight · June 14, 2022, 2:54pm

Hi luigicrisci1997,

I believe the nsys crash you experienced has been fixed in a newer build of nsys. Can you upgrade to the latest version of nsys?

Also, looking at your screenshot, it looks like CPU sampling data was collected. When you ask for ‘CPU Statistics’, are you asking for the CPU sampling summary/histogram results? If so, make sure you are selecting the ‘Bottom-Up View’ in the drop down box below the timeline. See the attached screenshot.

rknight · June 14, 2022, 3:09pm

Looking closer at this conversation. Sorry, I didn’t realize report1.nsys-rep was collected on a Windows system. It looks correct.

A paranoid level of 2 should work to profile your application and any processes it launches. You do not need to set the paranoid level any lower.

I also don’t think you need to use systemwide sampling unless you are trying to understand what else is running on the system and using system resources while your application runs.

Can you try launching the docker with the --privileged=true switch, install the latest version of nsys, and run the nsys status --environment command from within the docker? Please post those results here. Then, try your collection again.

luigicrisci1997 · June 16, 2022, 8:35am

Hi, sorry for the late answer.
Just tried to run the container with --privileged=true and installed the latest version of nsys.
The environment sampling looks correct now:

nsys status --environment
Timestamp counter supported: Yes

Sampling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-37-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
Sampling Environment: OK

Anyway, still no info about CPU op:
report1.nsys-rep (185.4 KB)

I’m also unable to get those info on the host machine without docker, so I suppose the error could be system-related

rknight · June 16, 2022, 2:07pm

Your report1.nsys-rep file did include CPU Instruction Pointer samples. Check out the attached screenshot of that collection’s diagnostics. The red box is a warning indicating that kernel IP samples can’t be collected - i.e. IP samples of OS execution can’t be collected. This warning is not an error. The green box shows that 542 CPU IP samples were collected.

But, maybe you are asking about something else. What do you mean when you say no “CPU op” info was collected?

luigicrisci1997 · June 16, 2022, 3:12pm

In the Windows report, there are additional information about what happens on the Threads:
This is on linux:

Here on Windows:

You see if the thread was blocked due to a user request etc.

That’s not such a big issue but I was just trying to understand why such differences appear

rknight · June 21, 2022, 2:16pm

The thread state information comes from the OS. Linux and Windows provide different information and nsys utilizes what is available. So, these differences are expected.

system · July 20, 2022, 4:05pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CPU sampling in privileged Docker container via `sudo nsys` Profiling Linux Targets docker	4	2809	July 6, 2022
Nsight Systems Issue: Unable to configure the collection of CPU IP samples Profiling Linux Targets	12	8747	December 27, 2021
Nsight-system can't recognize the conda enviroment when profile the application Profiling Linux Targets cuda	4	1133	March 2, 2023
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1652	January 12, 2023
Importer error status: An unknown error occurred. without any reason Profiling Linux Targets cudnn	2	84	July 31, 2024
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	28	7036	November 14, 2024
CPU core metrics do not match selected options Profiling Linux Targets	13	50	August 5, 2024
Nsight system failed to start daemon while profiling a remote Linux server Profiling Linux Targets	7	1555	November 9, 2022
Missing cpu samples when profiling within Docker, but no errors reported Profiling Embedded Targets	3	1407	July 26, 2021
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2473	September 14, 2023

Linux Kernel Paranoid Level = -1: OK

Related topics