Nsight compute profile run with nan value in multi-process service(MPS)

Hello,
I am using ncu in multi-process service(MPS). The ncu command line options used are:
ncu --metrics gpc__cycles_elapsed.max application
I am getting nan for my run as follows.

The details of the GPU are as follows.
GPU card: NVIDIA A40
Driver version: 515.65.01
CUDA Version: 11.7

To start the MPS, I have used the following commands:
export CUDA_VISIBLE_DEVICES=“0”
nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
nvidia-cuda-mps-control -d

However, when I run the application without MPS, it gives the correct profiling results.

Any help would be appreciated.

MPS is not supported by Nsight Compute, see the “Profiling and Metrics” section under the known issues.

1 Like

If i want to profile l2 cache hit rate and memory throughput with MPS, how can i track this information?

Nsight Systems should be able to collect GPU metrics for the whole GPU regardless of MPS. “L2 hit rate” metric is available in the metric set called “Graphics Throughput Metrics for NVIDIA GA10x (frequency >= 10kHz)”.

To use that, please start with the following command:

nsys profile --gpu-metrics-device=all --gpu-metrics-set=ga10x-gfxt ./myApp

I turn on MPS and start multi-process in parallel, but problems occur. Below is the bash code to run program in parallel.

#!/bin/bash
file_name=./mps_no_hooking
sm_num=$1
ps_num=$2

for i in $(seq 1 $ps_num); do
  sudo nsys profile --gpu-metrics-device=0 --gpu-metrics-set=ga10x-gfxt $file_name ${i} &
done

wait

Then this error code shows

user@vandal:~/libsmctrl$ sudo nsys profile --output=report1.qdrep --trace=cuda,nvtx,osrt --cuda-memory-usage=true --capture-range=cudaProfilerApi --capture-range-end=stop ./exec_process.sh 2
SM NO: 1
PID: 25944
L2 cache size: -812471192
/home/user/libsmctrl/./mps_no_hooking:libsmctrl.c:212: Error subscribing to launch callback. CUDA returned error code 999.
Generating '/tmp/nsys-report-a76e.qdstrm'
[1/1] [========================100%] report6.nsys-rep
Generated:
    /home/user/libsmctrl/report6.nsys-rep
Generated:

Output about l2 cache size is in my code, but it shows wrong(strange) values. I don’t know the reason.
Please tell me if you have a solution about this issue. Thank you!

In addition, i have one more question
server GPU is GeForce RTX 3090, and Nsight system version is 2024.4.1, and server ubuntu version is 18.04.6 LTS
I want to know GPU metric such as L2 cache hit rate. However there exists a problem like below.
GPU device doesn’t show in the list of ‘GPUs’.

Please answer this question…
I have tried to solve this problem for 3 days, but failed to solve.

Hi, @logg72

Your “GPU device doesn’t show in the list of ‘GPUs’” seems a set up issue or operation issue.
Can you please restart a topic in “Nsight System” directly to get help ?
Thanks !

Thank you! Fortunately, i solved this problem by creating a .conf extension file and fill out with options nvidia NVreg_RestrictProfilingToAdminUsers=0.
But another problem is occured. I want to use ‘libsmctrl library’ (libsmctrl_set_next_mask). However, Nsight system doesn’t work with this library. Do you know about this reason?

Sorry. I am not familiar with Nsight System usage. Please ask in Nsight System forum directly.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.