Nsys and ncu in docker lose events function called "splitKreduce_kernel"

xujiahao_muggle · August 28, 2024, 9:35am

I was trying to deploy my project in docker with nsys and ncu to do profile, and the version of TensorRT and CUDA-Toolkit are the same as my host machine: CUDA-V11.6.55 and TensorRT8.5.2.2.

But when I trying to profile with the nsys cmd nsys profile --trace=cuda,osrt,nvtx --force-overwrite true --output=xx ./my_bin, the nsys in host caught 73 kernels while the nsys in docker caught only 67 kernels. And I find the lost 6 kernels are all called splitKreduce_kernel. What can cause this problem?

Here are some more detailed descriptions of the environment:
GPU: RTX3070
Driver Version: 535.179
Docker: docker version 26.1.3; CUDA-Toolkit version 11.6.55; TensorRT version8.5.2.2;CUDNN8.4.0
Host: CUDA-Toolkit version 11.6.55; TensorRT version8.5.2.2; CUDNN8.9.6
I run my docker image with cmd docker run --cap-add=SYS_ADMIN --name xx --runtime=nvidia --gpus all -it image_name /bin/bash

The detail of splitKreduce_kernel：
void splitKreduce_kernel<(int)32, (int)16, int, float, float, float, float, (bool)1, (bool)0, (bool)0>(cublasSplitKParams, const T4 *, const T5 *, T5 *, const T6 *, const T6 *, const T7 *, const T4 *, T7 *, void *, long, T6 *, int *)

hwilper · August 28, 2024, 1:26pm

@rknight can you comment here?

rknight · August 28, 2024, 5:42pm

Hi xujiahao_muggle,

Can you copy the diagnostics provided in the nsys-rep file to this thread?

Thanks!

xujiahao_muggle · August 29, 2024, 5:07am

Here are the provided nys-rep files:

nsys_profile.zip (374.0 KB)

The issue occurs on CUDA HW/Default stream7/Kernels: Docker caught 67 kernels while host caught 73 kernels.

You can distinguish the environment from the file name. Thanks a lot!

rknight · August 30, 2024, 5:43pm

Thanks for the nsys-rep files. I’m investigating.

rknight · August 30, 2024, 6:35pm

Can you upgrade to the latest version of Nsight Systems and try these collections again? You are using a version that is about a year old. You can find the latest version at Nsight Systems - Get Started | NVIDIA Developer

If it continues to fail after upgrading to the latest version of the tool, can you run a collection in the docker. Before the collection, add a file called “config.ini” to your target-linux-x64 directory. In the config.ini file, add this line;
CudaSkipSomeApiCalls = false

Also, does your workload work as expected when run inside the docker? In other words, does it seem like the kernels missing in the trace are get executed?

Finally, does the missing kernel originate in its own module?

xujiahao_muggle · August 31, 2024, 5:53pm

Thanks for your reply, and I have upgraded my Nsight Systems to the version:
NVIDIA Nsight Systems version 2024.5.1.113-245134619542v0(from cmd nsys --version)

And it’s saved in: /opt/nvidia/nsight-systems/2024.5.1, I have also add CudaSkipSomeApiCalls = false to file called config.ini in: /opt/nvidia/nsight-systems/2024.5.1/target-linux-x64, but the issue still exists.

I also find out when I use the trtexec in docker to generate the engine(both used by host and docker), this issue vanished and the inference results are the same: both of docker and host caught 57 kernels.

But when I use the trtexec in host to generate the engine, this issue happens again with the inference results slightly different, that’s weird, cause the TensorRT and CUDA version in docker are the same as host.

rknight · September 3, 2024, 2:28pm

Hi xujiahao_muggle, based on your comments about trtexec, I believe nsys is working correctly.

Topic		Replies	Views
Nsys does not show CUDA kernels Profiling Linux Targets	6	1232	December 12, 2022
Nsys profile doesn't return detail information of cuda/nvtx from docker container Profiling Linux Targets nsight	3	1099	February 28, 2022
Nsys profile in Deepstream container Profiling Linux Targets nsight , deepstream	9	1519	September 10, 2022
No CPU samples in docker on Jetson AGX Profiling Linux Targets nsight , jetson	1	533	October 18, 2022
[problem] Nsight System cannot collect program performance data in a multi-node distributed environment Profiling Linux Targets	4	808	April 20, 2023
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1646	January 12, 2023
No CUDA kernels shown in nsys profiler timeline when using dynamic parallelism Nsight Systems cuda , kernel , nsight	4	1406	January 7, 2021
Nsys dying with "Agent launcher failed." Profiling Linux Targets	14	1312	March 13, 2023
NSight Compute not finding kernels Nsight Compute	24	446	October 24, 2024
Inconsistent results with nsight systems Profiling Embedded Targets	5	801	June 20, 2023

Nsys and ncu in docker lose events function called "splitKreduce_kernel"

Related topics