Help decipher logs(No GPU associated to the given GPU ID)

Hello,
Could you please help me decipher the meaning of this log?

  1. Why does it say “No CUDA events collected. Does the process use CUDA?” ?
  2. Why does it say “No GPU associated to the given GPU ID”?
    Here is the log:
    Daemon -00:00.040
    Hardware event ‘instructions’, with sampling period 1000000, used to trigger sample collection.
    Information Daemon -00:00.040
    Dwarf backtraces collected.
    Information Analysis 00:00.000
    Profiling has started.
    Information Daemon 103547 00:00.000
    Process was launched by the profiler, see /tmp/nvidia/nsight_systems/streams/pid_103547_stdout.log and stderr.log for program output
    Information Daemon 103547 00:00.000
    Profiler attached to the process.
    Information Injection 103555 00:00.061
    Common injection library initialized successfully.
    Information Injection 103555 00:00.073
    OS runtime libraries injection initialized successfully.
    Information Injection 103558 00:00.121
    Common injection library initialized successfully.
    Information Injection 103558 00:00.132
    OS runtime libraries injection initialized successfully.
    Information Injection 103558 00:03.291
    Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
    Information Injection 103558 00:03.413
    CUDA injection initialized successfully.
    Information Analysis 00:10.849
    Profiling has stopped.
    Information Daemon 00:16.733
    Number of IP samples collected: 167,145.
    Warning Daemon 00:16.733
    The operating system throttled the collection of sampling data 6129 times.
    Error Analysis 00:17.219
    Event requestor failed: Source ID=
    Type=ErrorInformation (18)
    Properties:
    OriginalSource (145)=EventRequestor
    Error information:
    TargetSideError (1100)
    Properties:
    ErrorText (100)=/build/agent/work/20a3cfcd1c25021d/QuadD/Target/quadd_d/quadd_d/jni/TimeConverter.cpp(709): Throw in function {anonymous}::CpuTimeDomain {anonymous}::GpuTicksConverter::ConvertToCpuTime(QuadDCommon::CudaDeviceId, uint64_t&) const
    Dynamic exception type: boost::exception_detail::clone_impl
    std::exception::what: NotFoundException
    [QuadDCommon::tag_error_text*] = No GPU associated to the given GPU ID

ServiceName (200)=AnalysisService
MethodName (201)=GetData
NotFoundError (127)
Properties:
ErrorText (100)=No GPU associated to the given GPU ID
OriginalFile (140)=/build/agent/work/20a3cfcd1c25021d/QuadD/Target/quadd_d/quadd_d/jni/TimeConverter.cpp
OriginalLine (141)=709
OriginalFunction (142)={anonymous}::CpuTimeDomain {anonymous}::GpuTicksConverter::ConvertToCpuTime(QuadDCommon::CudaDeviceId, uint64_t&) const
OriginalExceptionClass (143)=N5boost16exception_detail10clone_implIN11QuadDCommon17NotFoundExceptionEEE
Warning Analysis 103558 00:26.038
CUDA profiling might have not been started correctly.
Warning Analysis 103558 00:26.038
No CUDA events collected. Does the process use CUDA?
Warning Analysis 103555 00:26.038
CUDA profiling might have not been started correctly.
Warning Analysis 103555 00:26.038
No CUDA events collected. Does the process use CUDA?
Warning Analysis 103558 00:26.038
Not all OS runtime libraries events might have been collected.
Information Analysis 103558 00:26.038
Number of OS runtime libraries events collected: 34,388.
Information Analysis 103555 00:26.038
Number of OS runtime libraries events collected: 1.
Error Analysis 00:26.038
Some events (3,053) were lost. Certain charts (including CPU utilization) on the timeline may display incorrect data. Try to decrease sampling rate and start a new profiling session.

The version is 2021.2.1
Also here is the summary of the hardware and versions i am using:

Platform Linux
OS CentOS Linux 7 (Core)
Hardware platform x86_64
CPU description Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
GPU descriptions Tesla PG500-216;Tesla PG500-216;Tesla PG500-216;Tesla PG500-216
NVIDIA driver version 46073.01
CPU context switch supported
GPU context switch supported
Guest VM id 0
Tunnel traffic through SSH yes
Timestamp counter supported

Maybe it is related to this? I can’t select “Collect GPU metrics”. Everything is grayed out

And here is my metric set:


Isn’t there missing anything?

Nsys GPU metric sampling has a minimum driver requirement of r460 and a minimum architecture version of Turing. It looks like you have the right driver but your system has Voltas.

Is that the problem?

Hey hwilper,
There is a few problems.

  1. What does " No GPU associated to the given GPU ID" error mean?
  2. “Nsys GPU metric sampling has a minimum driver requirement of r460 and a minimum architecture version of Turing” so is there a way to figure out things like memory bandwidth utilization in situation where multiple kernels are running in parallel, on this hardware?
    3.Is my setup incorrect?

Some more information.
I am using a bash wrapper because i need to pass some input. sh -c run_test_profiler_jacekt.sh
I am also setting LD_LIBRARY_PATH.

When i do : nsys profile -b dwarf --stats=true -f true -y 60 -w false -t cuda -d 30 -o $OUTPUT sh -c run_test_profiler_jacekt.sh
everything works fine, but over ssh target it does not.

The executable being run is java and kernels are executed from multiple threads, but not the main threads.
I has simillar problems with ncu but adding option --target-processes all resolved the problem there. But not in the nsight system.

Do you have include child processes checked in the GUI?

Now to your problems.

  1. GPU metrics are only sampled on one GPU. When trying to sample on your system, the machine is going to the given or default GPU, but then finding it is not a Turing or Ampere, it can’t hook onto the system to get metrics. This is not a good error message.

  2. Nsight Systems can give you memory utilization in pre-Turing systems for CUDA programs, using GPU Memory trace:
    image
    or you can use --cuda-memory-usage option in the CLI. Note that this functionality can have serious performance effects. But this is usage, if you want bandwidth, I think you are going to have to turn to Nsight Compute.

  3. Your setup looks good, except that you have Volta chips and you need Turings or Amperes for GPU Performance Metric Sampling to work.

Hope this answers your questions.

holly

Hi Holly,

Do you have include child processes checked in the GUI?

If i only was given a chance :)


BTW: it is extremely confusing that you are calling threads processes.

Nsight Systems can give you memory utilization in pre-Turing systems for CUDA programs, using GPU Memory trace

Ok, i will give it a go. Thanks!

But this is usage, if you want bandwidth, I think you are going to have to turn to Nsight Compute.

I am not sure what is the difference between usage and bandwidth. Does Nsight Compute allow to measure combined, actual bandwidth used by multiple kernels running concurrently?

BTW: i notice that my version only mentions -g to get the best call stacks and i am actually not getting them very good. I will try -fno-omit-frame-pointers and -funwind-tables
What version was your screenshot from? Looks more reasonable than what i am using:)

Ah, it does not suggest -fno-omit-frame-pointers and -funwind-tables because i am using dwarf.

Ok, i understand what memory usage is. So i need bandwidth.

Hmm…you are using 2021.2.1 and those screenshots (which I snagged from the online docs) are also from 2021.2.1 (as it happens, I am the person that updated the docs)…so not sure why you would be seeing anything different.

You may want to use the roofline charts in Nsight Compute to better see memory bandwidth versus optimal, Kernel Profiling Guide :: Nsight Compute Documentation

not sure why you would be seeing anything different.

Well, is there anything i can test to figure this out? Maybe some stray library was loaded? When was this functionality added?

Sorry, I’ve lost track of exactly which thing you were looking at. GPU metrics trace was added with 2021.2 (March this year). CUDA memory allocation graph went in in 2020.5 (Nov/December last year).

I was asking about the checkbox “Include child processes” that magically disappears from my version of 2021.2.1.

Would you be so kind to actually connect over ssh to a machine and verify that you still see the checkbox that i do not?

Figured out my glitch on Friday, and forgot to drop a note here, sorry.

The screen shot what is seen with SSH → Tegra, you are correct. Unfortunately collect all processes should be the default, and that is not what you are seeing (but only through the GUI).

Looking into it more.

Hi @jacek.tomaka, could you help collecting debug logs to help investigation?

Steps:

  1. Download nvlog.config (508 Bytes) and save it to the $HOME directory of the target system.
  2. On the host system, open Nsys GUI and run a collection as normal.
  3. Close Nsys GUI.
  4. On the target system, there should be a log file at /tmp/nsight-sys.log. Share this file to us and we’ll look into it further.

Thanks,
Liuyi

Here you go:
nsight-sys.log (66.0 KB)

@jacek.tomaka - Thanks for sharing the debug log. However the log seems to be missing some pieces of information that we expect. Could you try again? Make sure to:

  1. Kill any running nsys process on target system before you save nvlog.config to $HOME
  2. Open GUI and connect to target system only after you save nvlog.config to $HOME

Liuyi,
This is extremely embarrassing but after i have done what you suggest, i stopped seeing this error.
Now i can see GPUs in the timeline.
Sorry about wasting so much of your time but looks like the problem has gone away.

This is weird because i have killed nsys in the past…
If i see it again i will capture the log and send it.
Regards.
Jacek Tomaka

No problem, just make sure you no longer see this error even without debugging log enabled, because sometimes the problem can be hidden with logging turned on.