Help decipher logs(No GPU associated to the given GPU ID)

jacek.tomaka · May 30, 2021, 2:35pm

Hello,
Could you please help me decipher the meaning of this log?

Why does it say “No CUDA events collected. Does the process use CUDA?” ?
Why does it say “No GPU associated to the given GPU ID”?
Here is the log:
Daemon -00:00.040
Hardware event ‘instructions’, with sampling period 1000000, used to trigger sample collection.
Information Daemon -00:00.040
Dwarf backtraces collected.
Information Analysis 00:00.000
Profiling has started.
Information Daemon 103547 00:00.000
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/streams/pid_103547_stdout.log and stderr.log for program output
Information Daemon 103547 00:00.000
Profiler attached to the process.
Information Injection 103555 00:00.061
Common injection library initialized successfully.
Information Injection 103555 00:00.073
OS runtime libraries injection initialized successfully.
Information Injection 103558 00:00.121
Common injection library initialized successfully.
Information Injection 103558 00:00.132
OS runtime libraries injection initialized successfully.
Information Injection 103558 00:03.291
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
Information Injection 103558 00:03.413
CUDA injection initialized successfully.
Information Analysis 00:10.849
Profiling has stopped.
Information Daemon 00:16.733
Number of IP samples collected: 167,145.
Warning Daemon 00:16.733
The operating system throttled the collection of sampling data 6129 times.
Error Analysis 00:17.219
Event requestor failed: Source ID=
Type=ErrorInformation (18)
Properties:
OriginalSource (145)=EventRequestor
Error information:
TargetSideError (1100)
Properties:
ErrorText (100)=/build/agent/work/20a3cfcd1c25021d/QuadD/Target/quadd_d/quadd_d/jni/TimeConverter.cpp(709): Throw in function {anonymous}::CpuTimeDomain {anonymous}::GpuTicksConverter::ConvertToCpuTime(QuadDCommon::CudaDeviceId, uint64_t&) const
Dynamic exception type: boost::exception_detail::clone_impl
std::exception::what: NotFoundException
[QuadDCommon::tag_error_text*] = No GPU associated to the given GPU ID

ServiceName (200)=AnalysisService
MethodName (201)=GetData
NotFoundError (127)
Properties:
ErrorText (100)=No GPU associated to the given GPU ID
OriginalFile (140)=/build/agent/work/20a3cfcd1c25021d/QuadD/Target/quadd_d/quadd_d/jni/TimeConverter.cpp
OriginalLine (141)=709
OriginalFunction (142)={anonymous}::CpuTimeDomain {anonymous}::GpuTicksConverter::ConvertToCpuTime(QuadDCommon::CudaDeviceId, uint64_t&) const
OriginalExceptionClass (143)=N5boost16exception_detail10clone_implIN11QuadDCommon17NotFoundExceptionEEE
Warning Analysis 103558 00:26.038
CUDA profiling might have not been started correctly.
Warning Analysis 103558 00:26.038
No CUDA events collected. Does the process use CUDA?
Warning Analysis 103555 00:26.038
CUDA profiling might have not been started correctly.
Warning Analysis 103555 00:26.038
No CUDA events collected. Does the process use CUDA?
Warning Analysis 103558 00:26.038
Not all OS runtime libraries events might have been collected.
Information Analysis 103558 00:26.038
Number of OS runtime libraries events collected: 34,388.
Information Analysis 103555 00:26.038
Number of OS runtime libraries events collected: 1.
Error Analysis 00:26.038
Some events (3,053) were lost. Certain charts (including CPU utilization) on the timeline may display incorrect data. Try to decrease sampling rate and start a new profiling session.

The version is 2021.2.1
Also here is the summary of the hardware and versions i am using:

Platform	Linux
OS	CentOS Linux 7 (Core)
Hardware platform	x86_64
CPU description	Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
GPU descriptions	Tesla PG500-216;Tesla PG500-216;Tesla PG500-216;Tesla PG500-216
NVIDIA driver version	46073.01
CPU context switch	supported
GPU context switch	supported
Guest VM id	0
Tunnel traffic through SSH	yes
Timestamp counter	supported

jacek.tomaka · May 31, 2021, 3:37pm

Maybe it is related to this? I can’t select “Collect GPU metrics”. Everything is grayed out

And here is my metric set:

Isn’t there missing anything?

hwilper · June 1, 2021, 3:30pm

Nsys GPU metric sampling has a minimum driver requirement of r460 and a minimum architecture version of Turing. It looks like you have the right driver but your system has Voltas.

Is that the problem?

jacek.tomaka · June 2, 2021, 3:58am

Hey hwilper,
There is a few problems.

What does " No GPU associated to the given GPU ID" error mean?
“Nsys GPU metric sampling has a minimum driver requirement of r460 and a minimum architecture version of Turing” so is there a way to figure out things like memory bandwidth utilization in situation where multiple kernels are running in parallel, on this hardware?
3.Is my setup incorrect?

jacek.tomaka · June 2, 2021, 4:56am

Some more information.
I am using a bash wrapper because i need to pass some input. sh -c run_test_profiler_jacekt.sh
I am also setting LD_LIBRARY_PATH.

When i do : nsys profile -b dwarf --stats=true -f true -y 60 -w false -t cuda -d 30 -o $OUTPUT sh -c run_test_profiler_jacekt.sh
everything works fine, but over ssh target it does not.

The executable being run is java and kernels are executed from multiple threads, but not the main threads.
I has simillar problems with ncu but adding option --target-processes all resolved the problem there. But not in the nsight system.

hwilper · June 2, 2021, 1:41pm

Do you have include child processes checked in the GUI?

Now to your problems.

GPU metrics are only sampled on one GPU. When trying to sample on your system, the machine is going to the given or default GPU, but then finding it is not a Turing or Ampere, it can’t hook onto the system to get metrics. This is not a good error message.
Nsight Systems can give you memory utilization in pre-Turing systems for CUDA programs, using GPU Memory trace:

or you can use --cuda-memory-usage option in the CLI. Note that this functionality can have serious performance effects. But this is usage, if you want bandwidth, I think you are going to have to turn to Nsight Compute.
Your setup looks good, except that you have Volta chips and you need Turings or Amperes for GPU Performance Metric Sampling to work.

Hope this answers your questions.

holly

jacek.tomaka · June 2, 2021, 1:58pm

Hi Holly,

Do you have include child processes checked in the GUI?

If i only was given a chance :)

BTW: it is extremely confusing that you are calling threads processes.

Nsight Systems can give you memory utilization in pre-Turing systems for CUDA programs, using GPU Memory trace

Ok, i will give it a go. Thanks!

But this is usage, if you want bandwidth, I think you are going to have to turn to Nsight Compute.

I am not sure what is the difference between usage and bandwidth. Does Nsight Compute allow to measure combined, actual bandwidth used by multiple kernels running concurrently?

jacek.tomaka · June 2, 2021, 2:14pm

BTW: i notice that my version only mentions -g to get the best call stacks and i am actually not getting them very good. I will try -fno-omit-frame-pointers and -funwind-tables
What version was your screenshot from? Looks more reasonable than what i am using:)

jacek.tomaka · June 2, 2021, 2:25pm

Ah, it does not suggest -fno-omit-frame-pointers and -funwind-tables because i am using dwarf.

jacek.tomaka · June 2, 2021, 2:37pm

Ok, i understand what memory usage is. So i need bandwidth.

hwilper · June 2, 2021, 3:10pm

Hmm…you are using 2021.2.1 and those screenshots (which I snagged from the online docs) are also from 2021.2.1 (as it happens, I am the person that updated the docs)…so not sure why you would be seeing anything different.

You may want to use the roofline charts in Nsight Compute to better see memory bandwidth versus optimal, Kernel Profiling Guide :: Nsight Compute Documentation

jacek.tomaka · June 3, 2021, 3:25am

not sure why you would be seeing anything different.

Well, is there anything i can test to figure this out? Maybe some stray library was loaded? When was this functionality added?

hwilper · June 3, 2021, 6:46pm

Sorry, I’ve lost track of exactly which thing you were looking at. GPU metrics trace was added with 2021.2 (March this year). CUDA memory allocation graph went in in 2020.5 (Nov/December last year).

jacek.tomaka · June 4, 2021, 1:46pm

I was asking about the checkbox “Include child processes” that magically disappears from my version of 2021.2.1.

Would you be so kind to actually connect over ssh to a machine and verify that you still see the checkbox that i do not?

hwilper · June 7, 2021, 7:41pm

Figured out my glitch on Friday, and forgot to drop a note here, sorry.

The screen shot what is seen with SSH → Tegra, you are correct. Unfortunately collect all processes should be the default, and that is not what you are seeing (but only through the GUI).

Looking into it more.

liuyis · June 7, 2021, 11:11pm

Hi @jacek.tomaka, could you help collecting debug logs to help investigation?

Steps:

Download nvlog.config (508 Bytes) and save it to the $HOME directory of the target system.
On the host system, open Nsys GUI and run a collection as normal.
Close Nsys GUI.
On the target system, there should be a log file at /tmp/nsight-sys.log. Share this file to us and we’ll look into it further.

Thanks,
Liuyi

jacek.tomaka · June 8, 2021, 11:28am

Here you go:
nsight-sys.log (66.0 KB)

liuyis · June 8, 2021, 9:23pm

@jacek.tomaka - Thanks for sharing the debug log. However the log seems to be missing some pieces of information that we expect. Could you try again? Make sure to:

Kill any running nsys process on target system before you save nvlog.config to $HOME
Open GUI and connect to target system only after you save nvlog.config to $HOME

jacek.tomaka · June 9, 2021, 1:58pm

Liuyi,
This is extremely embarrassing but after i have done what you suggest, i stopped seeing this error.
Now i can see GPUs in the timeline.
Sorry about wasting so much of your time but looks like the problem has gone away.

This is weird because i have killed nsys in the past…
If i see it again i will capture the log and send it.
Regards.
Jacek Tomaka

liuyis · June 9, 2021, 2:03pm

No problem, just make sure you no longer see this error even without debugging log enabled, because sometimes the problem can be hidden with logging turned on.

Topic		Replies	Views
[QuadDCommon::tag_message*] = No GPU associated to the given UUID Profiling Linux Targets	24	907	November 5, 2024
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1661	January 12, 2023
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	8886	January 11, 2023
Unable to capture "Can't find UUID for CUDA device" Profiling Linux Targets	10	2337	November 9, 2023
Nsys cannot collect cuda information on Drive OS 5.1 DRIVE AGX Xavier General drive-devtools	62	3879	October 12, 2021
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2514	September 14, 2023
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3438	July 28, 2023
Nsight system fails to connect to daemon Profiling Linux Targets	25	2694	April 12, 2023
How to profile an application with Cuda 12.1 driver? Profiling Linux Targets	19	2544	July 18, 2023
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	30	7221	March 5, 2025

Help decipher logs(No GPU associated to the given GPU ID)

Related topics