NSIGHT SYSTEM: Runtime Error and reported QuadDCommon::NotFoundException

I have been trying to run nsys on our local cluster and it failed to produce the output file. It reports QuadDCommon::NotFoundException, and I could not find any information about it. Would someone please help me with this problem? Thank you very much!

The command line I was using: srun --mem=100G -t 1-0 -p idle --gpus=tesla_t4 nsys profile -o cmdb.nsys.prof load-dbg chinese.trio.128k-10kbp.b.fa chinese.trio.128k-10kbp.fastq > output2.prof 2>&1

The output2.prof file says:

Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Collecting data...
..... # our code running stuff
Processing events...
terminate called after throwing an instance of 'boost::wrapexcept<QuadDCommon::NotFoundException>'
  what():  NotFoundException

Interestingly, we could get results using nsight compute: srun --mem=100G -t 1-0 -p idle --gpus=tesla_t4 ncu -f -o cmdb.nsys.prof load-dbg chinese.trio.128k-10kbp.b.fa chinese.trio.128k-10kbp.fastq > output2.prof 2>&1

We have been using nvhpc 21.9 for this.

Thank you so much for taking time reading and helping us with this problem!

1 Like

Hello,

same here:
CentOS8 Stream
cuda-11.5 (happens also with 11.4, but not with 11.3)
Invocation: /usr/local/cuda-11.5/bin/nsys profile -o output <executable>

Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Collecting data...

<executable finishes>

Processing events...
terminate called after throwing an instance of 'boost::wrapexcept<QuadDCommon::NotFoundException>'
  what():  NotFoundException
Aborted (core dumped)

Anything we can do to debug this?

I’m seeing exaclty the same issue on CUDA11.5. Seems like a bug from NV side?

could you post the whole command you use? and the output

Did you solve this problem?
I’m suffering the same issue.
Exception occurs only when optional application arguments are given.

I did in the Nov 18 Post

Invocation /usr/local/cuda-11.5/bin/nsys profile -o output <executable>

I also posted the entire output that’s not from the program I’m profiling. Unfortunately, it’s really not more than:

Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.
Collecting data...

<stuff from my executable>

Processing events...
terminate called after throwing an instance of 'boost::wrapexcept<QuadDCommon::NotFoundException>'
  what():  NotFoundException
Aborted (core dumped)

Sorry that I cannot provide more. Is the source code of nsys somehow available or is there a version with debug symbols, so I could try to catch the exception at its source?

Same error on CUDA11.4
terminate called after throwing an instance of 'boost::wrapexcept<QuadDCommon::NotFoundException>' what(): NotFoundException

any updates? same issue occured

Collecting data…
Processing events…
terminate called after throwing an instance of ‘boost::wrapexceptQuadDCommon::NotFoundException

  • what(): NotFoundException*

The problem disappeared on my side using version 22.3. I haven’t tried to use the newest 22.5 yet. Did you upgrade your nvhpc toolkit and your CUDA drivers? I believe mine is using cuda 11.6.

Error occured when I put nsys profile in a shell script and run the script. However, if I executed nsys profile command manually, it didn’t show up the error message, very strange :(

Is it possible that you have multiple versions of nsys in the system? The script might be picking up a different version.
When you launch manually, are you providing the path where nsys is located or are you launching while being at the directory you installed nsys?

You’re right, I’ve installed both 2021.3 and 2022.3’s nsys, but export PATH=/usr/local/bin:$PATH in the script before running nsys. I’ll try to provide the absolute path in the script to see if the situation change.

got this error for not setting --gpu-metrics-device properly

This issue could be solved by using " --gpu-metrics-device=all" option. For example:
nsys profile --gpu-metrics-device=all --stats true -o test ./test

My cuda version is 11.4.152, nsys version is 2021.3.2.4, driver version is 510.54, GPU is TitanV.