Hello,
I am using ncu in multi-process service(MPS). The ncu command line options used are:
ncu --metrics gpc__cycles_elapsed.max application
I am getting nan for my run as follows.
Nsight Systems should be able to collect GPU metrics for the whole GPU regardless of MPS. “L2 hit rate” metric is available in the metric set called “Graphics Throughput Metrics for NVIDIA GA10x (frequency >= 10kHz)”.
To use that, please start with the following command:
I turn on MPS and start multi-process in parallel, but problems occur. Below is the bash code to run program in parallel.
#!/bin/bash
file_name=./mps_no_hooking
sm_num=$1
ps_num=$2
for i in $(seq 1 $ps_num); do
sudo nsys profile --gpu-metrics-device=0 --gpu-metrics-set=ga10x-gfxt $file_name ${i} &
done
wait
Then this error code shows
user@vandal:~/libsmctrl$ sudo nsys profile --output=report1.qdrep --trace=cuda,nvtx,osrt --cuda-memory-usage=true --capture-range=cudaProfilerApi --capture-range-end=stop ./exec_process.sh 2
SM NO: 1
PID: 25944
L2 cache size: -812471192
/home/user/libsmctrl/./mps_no_hooking:libsmctrl.c:212: Error subscribing to launch callback. CUDA returned error code 999.
Generating '/tmp/nsys-report-a76e.qdstrm'
[1/1] [========================100%] report6.nsys-rep
Generated:
/home/user/libsmctrl/report6.nsys-rep
Generated:
Output about l2 cache size is in my code, but it shows wrong(strange) values. I don’t know the reason.
Please tell me if you have a solution about this issue. Thank you!
In addition, i have one more question
server GPU is GeForce RTX 3090, and Nsight system version is 2024.4.1, and server ubuntu version is 18.04.6 LTS
I want to know GPU metric such as L2 cache hit rate. However there exists a problem like below.
GPU device doesn’t show in the list of ‘GPUs’.
Your “GPU device doesn’t show in the list of ‘GPUs’” seems a set up issue or operation issue.
Can you please restart a topic in “Nsight System” directly to get help ?
Thanks !
Thank you! Fortunately, i solved this problem by creating a .conf extension file and fill out with options nvidia NVreg_RestrictProfilingToAdminUsers=0.
But another problem is occured. I want to use ‘libsmctrl library’ (libsmctrl_set_next_mask). However, Nsight system doesn’t work with this library. Do you know about this reason?