I have been trying to run nsys on our local cluster and it failed to produce the output file. It reports QuadDCommon::NotFoundException, and I could not find any information about it. Would someone please help me with this problem? Thank you very much!
The command line I was using: srun --mem=100G -t 1-0 -p idle --gpus=tesla_t4 nsys profile -o cmdb.nsys.prof load-dbg chinese.trio.128k-10kbp.b.fa chinese.trio.128k-10kbp.fastq > output2.prof 2>&1
The output2.prof file says:
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Collecting data...
..... # our code running stuff
Processing events...
terminate called after throwing an instance of 'boost::wrapexcept<QuadDCommon::NotFoundException>'
what(): NotFoundException
Interestingly, we could get results using nsight compute: srun --mem=100G -t 1-0 -p idle --gpus=tesla_t4 ncu -f -o cmdb.nsys.prof load-dbg chinese.trio.128k-10kbp.b.fa chinese.trio.128k-10kbp.fastq > output2.prof 2>&1
We have been using nvhpc 21.9 for this.
Thank you so much for taking time reading and helping us with this problem!
same here:
CentOS8 Stream
cuda-11.5 (happens also with 11.4, but not with 11.3)
Invocation: /usr/local/cuda-11.5/bin/nsys profile -o output <executable>
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Collecting data...
<executable finishes>
Processing events...
terminate called after throwing an instance of 'boost::wrapexcept<QuadDCommon::NotFoundException>'
what(): NotFoundException
Aborted (core dumped)
I also posted the entire output that’s not from the program I’m profiling. Unfortunately, it’s really not more than:
Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Collecting data...
<stuff from my executable>
Processing events...
terminate called after throwing an instance of 'boost::wrapexcept<QuadDCommon::NotFoundException>'
what(): NotFoundException
Aborted (core dumped)
Sorry that I cannot provide more. Is the source code of nsys somehow available or is there a version with debug symbols, so I could try to catch the exception at its source?
The problem disappeared on my side using version 22.3. I haven’t tried to use the newest 22.5 yet. Did you upgrade your nvhpc toolkit and your CUDA drivers? I believe mine is using cuda 11.6.
Error occured when I put nsys profile in a shell script and run the script. However, if I executed nsys profile command manually, it didn’t show up the error message, very strange :(
Is it possible that you have multiple versions of nsys in the system? The script might be picking up a different version.
When you launch manually, are you providing the path where nsys is located or are you launching while being at the directory you installed nsys?
You’re right, I’ve installed both 2021.3 and 2022.3’s nsys, but export PATH=/usr/local/bin:$PATH in the script before running nsys. I’ll try to provide the absolute path in the script to see if the situation change.