Nsys CUDA Trace Empty in AI Workbench Container on DGX Spark GB10

Problem

Running nsys profile --trace=cuda,nvtx python script.py inside an AI Workbench
container on DGX Spark GB10 produces a report with CPU/OS rows only. The CUDA lane
is absent. This is despite the workload actively using the GPU (confirmed via
nvidia-smi during the run).


Proposed Fix

The Holohub dev container passes --cap-add CAP_SYS_PTRACE explicitly in its
docker run command, and nsys works there. AI Workbench does not add this flag,
and nsys CUDA tracing fails. The fix is for Workbench to expose a capabilities
field in spec.yaml (or equivalent) so users can add SYS_PTRACE to their
project containers.

We verified this is the root cause via nsys status --environment (see Diagnostic
Output below). The fix is already proven in NVIDIA’s own tooling — it just hasn’t
been surfaced in Workbench.


Environment

Component Version
Hardware DGX Spark GB10
NVIDIA AI Workbench latest (Desktop App, macOS)
Container base image nvcr.io/nvidia/rapidsai/notebooks:26.04-cuda13-py3.12
CUDA 13.1.1
nsys 2024.2.3 (installed via apt.txt)
Python 3.12 (conda)

Symptom

No CUDA events collected. Does the process use CUDA?
No NVTX events collected. Does the process use NVTX?

Tracing nvtx alone yields only CCCL C++ library ranges from the CUDA driver
itself. Python-side NVTX annotations do not appear, regardless of whether the
Python nvtx package or cupy.cuda.nvtx.RangePush/RangePop is used.


Diagnostic Output

nsys status --environment inside the Workbench container:

CPU Profiling Environment Check
  Root privilege: disabled
  Linux Kernel Paranoid Level = 4
  Linux perf_event_open syscall available: Fail
  CPU Profiling Environment (process-tree): Fail

CUDA Profiling Environment Check
  CUDA driver: 570
  SYS_PTRACE: Fail           ← the blocker

The container is missing CAP_SYS_PTRACE. Without it, nsys cannot inject its
interception library and CUDA tracing is impossible regardless of nsys version.


Root Cause

AI Workbench does not pass --cap-add=SYS_PTRACE to its docker run command.
The Holohub dev container (an NVIDIA project) explicitly adds this flag and nsys
works there:

# from Holohub dev_container launch --verbose (March 2025, thread below):
docker run ... --cap-add CAP_SYS_PTRACE --ipc=host --ulimit=memlock=-1 ...

The Workbench spec.yaml has no capabilities field, and none of the official
Workbench example repos appear to expose one. The fix exists in NVIDIA’s own
ecosystem — it just hasn’t been surfaced in Workbench.

Reference: “Nsight in holohub not working”


Questions

  1. Does Workbench spec.yaml support adding Linux capabilities (e.g., SYS_PTRACE,
    SYS_ADMIN) or seccomp overrides? If so, what is the correct field?

  2. What is the supported workflow for nsys profiling in a Workbench container on
    DGX Spark GB10? The machine was purchased specifically for hardware profiling.

  3. Is the host paranoid level 4 an additional blocker once the container capability
    is fixed, or does CUDA tracing work regardless of the CPU profiling environment?


Code

We are profiling a Python scientific computing workload with NVTX annotations
around each algorithmic phase. Full source is available to NVIDIA personnel on
request.


Related Thread

This follows up on: “Nsys profile not showing any GPU data” (November 2025)

which reports the identical symptom on DGX Spark and was acknowledged by @aniculescu
in December 2025 with no resolution posted. We believe the root cause identified
above explains that report as well.

the project container is a bit prescriptive, but it’s possible that using a multi-container setup would make this easier.
See docs here: Docker Compose Environments — NVIDIA AI Workbench User Guide

I will loop in eng team as well.

Thank you. As we will want to debug fusing of kernels, I will also want all of NVIDIAs tools to apply. In particular, we are using cuTile/cuPy.

Launching the project container is a bit constrained so you can’t easily do what you want here.

You can as a work around use the compose feature to launch the same project container (or any other container).

This crude compose example would launch the project container via Compose instead of the normal way to enable the extra features you need. This example assumes:

  • project name is test-project
  • the project container is built so the image project-test-project exists
  • you want to run the jupyterlab instance in the default container
  • you want 1 GPU

You should be able to use this as a guide. If you have the Cursor app added to your project, it also should be able to help write compose files that work.

services:
  example-1:
    image: project-test-project
    command:
      - /bin/bash
      - -lc
      - >
        jupyter lab
        --allow-root
        --port 8888
        --ip 0.0.0.0
        --no-browser
        --NotebookApp.base_url=/projects/test-project/compose/example-1
        --NotebookApp.default_url=/lab
        --NotebookApp.allow_origin='*'
        --ServerApp.token=''
        --ServerApp.password=''
        --notebook-dir=/project
    cap_add:
      - SYS_PTRACE
    ipc: host
    working_dir: /project
    ports:
      - "8888:8888"
    environment:
      NVWB_TRIM_PREFIX: "false"
      NVWB_PROJECT_MOUNT_TARGET: /project
    volumes:
      - .:/project
    healthcheck:
      test: ["CMD-SHELL", "curl -fsS http://localhost:8888/projects/test-project/compose/example-1/lab >/dev/null"]
      interval: 10s
      timeout: 5s
      retries: 6
      start_period: 30s
    ulimits:
      memlock:
        soft: -1
        hard: -1
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: ["gpu"]```

Gentlefolk,

Sure, we’ll try your suggestions. Thank you. I’ll try to report back this afternoon.

With respect, allow me to suggest that if this is the AI Workbench team’s solution, then a playbook, at least, needs to be written with examples for every low level debugging and profiling tool NVIDIA ships. I went to GTC and paid to sit in almost every one of the debugging and profiling tutorials. In none of those tutorials was I expected to do Container-fu. For even more fun, I have two Sparks and the 200 gbps connector. Am I going to have to do something similar on each machine when I want to try a bigger job that spans machines? Or is AI Workbench the wrong tool for the job?

Again, Thank you for your reply. I’ll try it out after my next meeting adjourns.

Anon,
Andrew

Workbench is not built to run across two machines. So def don’t use it for that.

In terms of eliminating container-fu for this on a single instance, the suggested feature of a field in the spec.yaml file makes sense. need to think about it and try it a bit.

Additionally, in terms of setting up the CX-7 network on two sparks, the NVIDIA Sync app handles this. See here: Cluster Assistant for ConnectX-7 Multi-Node Clusters — NVIDIA Sync User Guide

It does NOT setup the application or workload though.

Resolution: What worked for us — a short playbook

Thanks to @twhitehouse and @dkleissas for the compose suggestion — it worked. Here’s the minimal path for anyone hitting the same issue.


The root cause (confirmed)

SYS_PTRACE: Fail in nsys status --environment is the blocker. The fix is adding CAP_SYS_PTRACE to the container via the Workbench compose feature.


Step 1 — Create a new Workbench project with a lowercase name

This is the most important advice we can offer. Docker Compose requires project names to be all lowercase. Workbench derives the compose project name from the meta.name field in .project/spec.yaml, and if your project name has any uppercase letters (e.g. cuTile), the compose feature will refuse to start with a cryptic error:

invalid project name "cuTile": must consist only of lowercase alphanumeric characters

Easiest fix: create a new project with a lowercase name from the start. Trying to rename an existing project is painful. We renamed ours and had no further issues.


Step 2 — Add nsys to apt.txt

The RAPIDS base image does not include nsys. Add to apt.txt (no version pin — NVIDIA’s repos stay current):

nsight-systems-cli
nsight-compute

Rebuild the project container.


Step 3 — Add a compose.yaml

Use the Workbench compose feature to create compose.yaml in the project root. The file must have a complete services: block — raw properties at the top level are rejected without a clear error message.

Working minimal file:

name: <your-lowercase-project-name>
services:
  profiler:
    image: project-<your-lowercase-project-name>:latest
    cap_add:
      - SYS_PTRACE
    ipc: host
    ulimits:
      memlock:
        soft: -1
        hard: -1
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: ["gpu"]
    volumes:
      - .:/project
    working_dir: /project
    command: sleep infinity

Start the compose service from the Workbench UI.


Step 4 — Verify

SSH to the DGX Spark and exec into the running container:

docker exec -it <project>-profiler-1 bash
nsys profile --trace=cuda python3 -c "import cupy as cp; x = cp.zeros((3,3)); print(x)"

Open the generated .nsys-rep file in the Nsight Systems GUI. You should see the CUDA HW lane with kernel activity.


Notes

  • CPU profiling will still show warnings (Paranoid Level 4). This is a host kernel setting and does not affect CUDA tracing.
  • The nvwb CLI was not needed for the profiling workflow — SSH + docker exec is simpler.
  • The nvwb shared volume mount can be omitted from the compose service if you don’t need it; keeping it causes a startup failure if the volume doesn’t exist under the new project name.

Hope this helps the next person. Happy to answer questions.

Thank you, we are about to need to run jobs that span machines. If this works out well, we will likely buy a third Spark to have a full mesh for our research prototyping system. The machine, even though its memory system is ridiculously underpowered, is proving to be quite useful. Size of memory is a virtue in this game. It will help us as we deploy big jobs on the SuperPod.