NVIDIA DevTools Sidecar Injector to using nsys will kill the container with exit status 1

Hi all, I was using devtool sidecar 1.0.7 to inject nsys to my vllm container. When I specify my process name inside injectionMatch. It will make container exit with status 1 without any logs. Did I misconfig anything? Is devtool injector still up-to-date? Also, I’m curious about how does this injector works, how does it instrument my running process without any notable modification to process configs inside oci spec?

# Nsight Systems profiling configuration
profile:
  # The arguments for the Nsight Systems. The placeholders will be replaced with the actual values.
  devtoolArgs: "profile --start-later false -o /home/auto_{PROCESS_NAME}_%{POD_FULLNAME}_%{CONTAINER_NAME}_{TIMESTAMP}_{UID}.nsys-rep"
  # The regex to match applications to profile.
  injectionMatch: "^(?!.*nsys( |$)).*\\bvllm.*$"

I keep the same injectionMatch, and start the container with a infinity sleep. This time, I exec to the container and manually run the vllm. The behavior is the same, the process exit with status 1.

root@meta-deployment-8f6479bd8-sr29v:/tmp# python -m vllm.entrypoints.api_server --host=0.0.0.0 --port=7080 --swap-space=16 ......
root@meta-deployment-8f6479bd8-sr29v:/tmp# echo $?
1
# Found a file was created under /tmp
root@meta-deployment-8f6479bd8-sr29v:/tmp# cat devtool-injection-k8s_auto__c4893ea6
PROCESS_ID=1004
PROCESS_NAME=python
#EOF

Update: I thought it was a problem with vllm, so I simply try command that contains “vllm” but not supposed to invoke GPU, such as “cat vllm”

I found from strace log that exec does works, and following injector library loaded. But after that, dynamic loading looks was messed up. I’m not an expert, but attach the log for nvidia expert to interpret. nsight_debug_output_cat_vllm.log (420.7 KB)

[pid 1705954] openat(AT_FDCWD, "/mnt/nv/bin/libPreRunProcessInjector.so", O_RDONLY|O_CLOEXEC) = 3

@mpopov to respond

@CoderSherlock , could you please try using NVIDIA Nsight Operator | NVIDIA NGC (it is the replacement for the NVIDIA DevTools Sidecar Injector)? You would need to slightly modify your configuration as follows:

# Nsight Systems profiling configuration
profile:
  # Arguments for Nsight Systems. Placeholders will be replaced with actual values.
  devtoolArgs: "profile --start-later false -o /home/auto_{NVDT_PROCESS_NAME}_%{NVDT_POD_FULLNAME}_%{NVDT_CONTAINER_NAME}_{NVDT_TIMESTAMP}_{NVDT_UID}.nsys-rep"
  
  # Regex pattern to match applications for profiling.
  injectionIncludePatterns:
    - ".*vllm.*"
  
  # Log file path inside the profiled Pod.
  logOutput: /home/operator.log  # Use "stdout" if preferred.

If you still encounter the issue after this change, could you please share the contents of /home/operator.log with us to help diagnose the problem?