I am working on an HPC project which requires me to profile a system involving 48 physical CPU cores, 5 GPUs, 40 CPU processes and 10 GPU processes at most. As I observed, the smaller the numbers of processes, the higher chance Nsight Sytem succeeds in profiling the system. Usually, at full scale, Nsight System crashes. I really would like to profile the system at full scale without crash. Any advice?
Moved to the Nsight Systems forum
How long are you profiling for? Can you give me the CLI command you are using (or tell me the GUI options)?
nsys profile -f true --trace-fork-before-exec=true -o /home/zfan/sandbox/profile/smaq_96cpu_10gpu_50c_10h_1t ./LeafStandAlone.x86-64 -noForcedPatches /home/zfan/sandbox/JobDump/MCAT_SP_220M/JobInfo_26_220 /home/zfan/sandbox/JobDump/MCAT_SP_220M/JobInfo_26_220.xml 20
So you ran the default sample and trace options on your application until you cancelled. How long did it run before crashing? Did it leave a .qdstrm or .nsys-rep file?