Remote profiling within a container docker/k3s

Hey, I’m trying to profile remotely a custom binary that runs in a container.

I have a k3s cluster running a pod with a container that has a binary. The binary runs under a L4T image. I have also added SSH access to this container, so the Dockerfile looks like this:

FROM nvcr.io/nvidia/l4t-base:r32.6.1
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y && \
     DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
         apt-transport-https \
         ca-certificates \
         gnupg \
         software-properties-common \
         openssh-server \
         apt-transport-https \
         curl \
         vim \
         wget && \
     rm -rf /var/lib/apt/lists/*

## SSH starts
RUN mkdir -p /greeneye/config/ && \
    chown -R root:root /greeneye && \
    chmod -R 0700 /greeneye && \
    chown root:root -R /root

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd

# CUDA environment is not passed by default to the SSH session. One has to export it in /etc/profile/
ENV PATH /usr/local/cuda-10.1/bin:$PATH
# https://stackoverflow.com/a/64472380/554540
ENV LD_LIBRARY_PATH /usr/local/cuda-10.1/lib64:/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
RUN echo "export PATH=$PATH" >> /etc/profile && \
    echo "ldconfig" >> /etc/profile

ENV NOTVISIBLE "in nsight profile"
RUN echo "export VISIBLE=now" >> /etc/profile
RUN ssh-keygen -P "" -t dsa -f /etc/ssh/ssh_host_dsa_key

EXPOSE 9022

RUN wget -qO - https://developer.download.nvidia.com/devtools/repos/ubuntu2004/arm64/nvidia.pub | apt-key add - && \
     echo "deb https://developer.download.nvidia.com/devtools/repos/ubuntu2004/arm64/ /" >> /etc/apt/sources.list.d/nsight.list && \
     apt-get update -y && \
     DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
         nsight-compute-2021.3.1  nsight-systems-cli && \
     rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/nvidia/nsight-compute/2021.3.1:${PATH}"
COPY entrypoint.sh entrypoint.sh
RUN chmod +x entrypoint.sh
RUN mkdir /var/run/sshd
RUN echo "root:docker"|chpasswd
COPY sshd_config /etc/ssh/sshd_config
COPY scripts /greeneye/scripts
RUN chmod +x /greeneye/scripts/*.sh

ENTRYPOINT /greeneye/scripts/wrapper.sh

warpper.sh

#!/bin/bash

# Start the first process
exec /greeneye/scripts/run-ssh-service.sh &
status=$?
if [ $status -ne 0 ]; then
  echo "Failed to start run-ssh-service: $status"
  exit $status
fi

# Start the second process
exec /greeneye/scripts/run-detector-service.sh &
status=$?
if [ $status -ne 0 ]; then
  echo "Failed to start run-detector-service: $status"
  exit $status
fi

# Naive check runs checks once a minute to see if either of the processes exited.
# This illustrates part of the heavy lifting you need to do if you want to run
# more than one service in a container. The container exits with an error
# if it detects that either of the processes has exited.
# Otherwise it loops forever, waking up every 60 seconds

while sleep 60; do
  ps aux |grep run-ssh-service |grep -q -v grep
  PROCESS_1_STATUS=$?
  ps aux |grep run-detector-service |grep -q -v grep
  PROCESS_2_STATUS=$? 
  # If the greps above find anything, they exit with 0 status
  # If they are not both 0, then something is wrong
  if [ $PROCESS_1_STATUS -ne 0 ]; then
    echo "SSH service process exited"
    exit 1
  fi
  if [ $PROCESS_2_STATUS -ne 0 ]; then
    echo "detector service process exited"
    exit 1
  fi
done

run-ssh-service.sh

chmod 700 /greeneye/config
chmod 600 /greeneye/config/*
chmod 644 -f ~/.ssh/known_hosts
chown -R root:root /greeneye/config
/usr/sbin/sshd -D -e

run-detector-service.sh

#!/bin/bash
/bin/sh -c \
    "./Detector" 

pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-pod-playground
spec:
  containers:
  - name: mypod
    image: myimage
    env:
    - name: LD_PRELOAD
      value: /opt/nvidia/nsight_systems/libToolsInjectionProxy64.so
    - name: CUDA_INJECTION64_PATH
      value: /opt/nvidia/nsight_systems/libToolsInjection64.so
    - name: QUADD_INJECTION_PROXY
      value: OSRT, $QUADD_INJECTION_PROXY
    imagePullPolicy: Always
    name: detector
    securityContext:
      privileged: true
      capabilities:
        add:
        - SYS_ADMIN

Usage

From the host computer - Ubuntu, which I used to install Jetpack (currently 4.5.1), I use Nsight Systems 2020.5.3.

It has been quite difficult to follow the documentation, has some state that one can only use Attach with remote SSH and some state that it’s possible to launch remotely. I only tried to attach, but it keeps showing errors messages such as:

Failed to connect to the application. Has it been run with Injection library?

CUDA profiling might have not been started correctly.

No CUDA events collected. Does the process use CUDA?

In some cases I see the following error as well, but not always:

Event requestor failed: Source ID=
Type=ErrorInformation (18)
 Properties:
  ErrorText (100)=Throw location unknown (consider using BOOST_THROW_EXCEPTION)
Dynamic exception type: boost::exception_detail::clone_impl
std::exception::what: ConvertEventError
[QuadDDaemon::tag_error_code*] = 55

Using LD_PRELOAD=/opt/nvidia/nsight_systems/libToolsInjectionProxy64.so at start doesn’t make much sense as it produces an error:

ERROR: ld.so: object ‘/opt/nvidia/nsight_systems/libToolsInjectionProxy64.so’ from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.

This happens because /opt/nvidia/nsight_systems is being created by the host nsight systems.

I did check Include child processes, which seems to be required in my case.

Questions

  • It is possible to use containers to run this scenario?
  • What am I missing?
  • Do I need to run the binary with ncu e.g ncu ./Detector instead?

Screenshots



@Andrey_Trachenko

Hey @hwilper, @Andrey_Trachenko

Did you success finding information regarding profiling a CUDA program within a container?

Thanks!
Alon

L4T r32.6.1 corresponds to JetPack 4.6, which shipped with Nsight Systems 2021.2.3. My suggestion is to use that version instead of 2020.5.3.

The Dockerfile installs nsight-systems-cli, but those builds are intended for Arm SBSA servers, and do not support Tegra. Only Nsight Systems distributed through JetPack supports profiling on Tegra.

The Attach mode has been deprecated in recent versions on Nsight Systems. Unfortunately, we might not be able to assist you further, if you experience problems with it.

My suggestions are:

  • Use Nsight Systems 2021.2.3 from JetPack 4.6 (and stay tuned for future JetPack releases, they often ship with a new version of Nsight Systems)
  • Use Nsight System CLI - /opt/nvidia/nsight_systems/nsys - to profile your application. This is a more agile approach, especially in more complex scenarios, compared to profiling from the GUI. To get the binaries installed on the target (Jetson), connect to it once from the host GUI. Also JetPack 4.6 ships with an Arm64 CLI .deb package that is installed directly on Jetson. In this case, the CLI can be used straight from /usr/local/bin/nsys.

If the problem reproduces with a more recent version, we can then discuss how to troubleshoot it further.

Profiling with Nsight Systems in containers is supported, maybe it’s less common on Tegra compared to servers, but I don’t think it should cause any issues.

Using Nsight Systems together with Nsight Compute at the same time is not supported, so please don’t use ncu when profiling.

Hey @Andrey_Trachenko,
Thanks for the answer. It is not easy for us at the moment to update to 4.6. mainly because of our custom carrier board.
so this update will take us time (months)
Is it possible to profile runner CUDA process inside a container in 4.5.1? and in 4.5?

Thanks
Alon

Alon, profiling in containers with Nsight Systems is supported in JetPack 4.5.1. The recommended way is to use the CLI.

However, in case you run into any issues that are already solved in a more recent release of Nsight Systems, our recommendation would be to upgrade. We understand that a full JetPack upgrade might not be easy in your project, in which case we can discuss how to pick up just a newer version of Nsight Systems.

The Dockerfile installs nsight-systems-cli , but those builds are intended for Arm SBSA servers, and do not support Tegra. Only Nsight Systems distributed through JetPack supports profiling on Tegra.

I removed it.

The Attach mode has been deprecated in recent versions on Nsight Systems. Unfortunately, we might not be able to assist you further, if you experience problems with it.

So in that case I run the container with nsys <binary>?

My suggestions are:

  • Use Nsight Systems 2021.2.3 from JetPack 4.6 (and stay tuned for future JetPack releases, they often ship with a new version of Nsight Systems)
  • Use Nsight System CLI - /opt/nvidia/nsight_systems/nsys - to profile your application. This is a more agile approach, especially in more complex scenarios, compared to profiling from the GUI. To get the binaries installed on the target (Jetson), connect to it once from the host GUI. Also JetPack 4.6 ships with an Arm64 CLI .deb package that is installed directly on Jetson. In this case, the CLI can be used straight from /usr/local/bin/nsys .

I have upgraded it. I am now trying to run the binary within the container with /opt/nvidia/nsight_systems/nsys profile <binary> but I’m not sure how to profile it remotely. Is it possible? Did you mean I should use nsys profile and generate a local profiling report?

Yes, the basic CLI usage is as you specified:

/opt/nvidia/nsight_systems/nsys profile <binary>

(and then you can add more options, such as -o <filename> to specify where the report file needs to be created.)

Then you copy the report file onto your host system, and open it with Nsight Systems GUI (File->Open, or from command line: nsys-ui report1.nsys-rep).

If you need interactive remote profiling, when your application is long running, and you want to start profiling sometime in the middle of its runtime, this is going to be a bit more involved compared to the delayed start in the GUI. In this case you will need to use nsys launch command to start the app, and then nsys start and nsys stop to profile it. I would only recommend this if profiling the full application runtime doesn’t work for you (too much data collected).

Great.

Regarding docker:

Profiling with Nsight Systems in containers is supported, maybe it’s less common on Tegra compared to servers, but I don’t think it should cause any issues.

How would you install /opt/nvidia/nsight_systems? Would you mount it? Install it again through the GUI?

I tried mounting but it complains:

# /opt/nvidia/nsight_systems/nsys -w true -t cuda,nvtx,osrt,cudnn,cublas  -s cpu /opt/outdoor/detector/Detector
Error: The CLI executable is in the 'target-linux-armv8' directory in your installation.
Modify the executable in your command to be a symbolic link pointing to 'target-linux-armv8/nsys'.

Then I ran the GUI and let it install the daemon, and it worked:

# /opt/nvidia/nsight_systems/nsys
 usage: nsys [--version] [--help] <command> [<args>] [application] [<application args>]

But it would be ideal to do this only once when a new version is out, not every time I need to run the profiler.

Ideas?

Starting with JetPack 4.5, the SDK Manager installation path can also install nsight-systems-cli package on Jetson. This DEB package contains everything needed for CLI profiling. When launching a container, it should be sufficient to mount the installation directory:

# On Jetson:
sudo docker run -v /opt/nvidia/nsight-systems-cli/<version>:/nsys  ...

# Inside the container on Jetson:
/nsys/bin/nsys profile <options> <binary>

If you don’t use the SDK Manager method, there is currently no good way to get this package installed. You could follow the manifest files in SDK Manager to extract a download URL, and for the current JetPack 4.6 the location would be:

https://developer.nvidia.com/assets/embedded/secure/tools/files/jetpack-sdks/jetpack-4.6/JETPACK_46_b194/nsight-systems-cli-2021.2.3_2021.2.3.8-1_arm64.deb

Download this file in your browser, then upload to Jetson and run:

sudo apt install ./nsight-systems-cli-2021.2.3_2021.2.3.8-1_arm64.deb

You would only need to repeat this step when switching to a new version of JetPack.