Hello,
Could you please help me decipher the meaning of this log?
Why does it say “No CUDA events collected. Does the process use CUDA?” ?
Why does it say “No GPU associated to the given GPU ID”?
Here is the log:
Daemon -00:00.040
Hardware event ‘instructions’, with sampling period 1000000, used to trigger sample collection.
Information Daemon -00:00.040
Dwarf backtraces collected.
Information Analysis 00:00.000
Profiling has started.
Information Daemon 103547 00:00.000
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/streams/pid_103547_stdout.log and stderr.log for program output
Information Daemon 103547 00:00.000
Profiler attached to the process.
Information Injection 103555 00:00.061
Common injection library initialized successfully.
Information Injection 103555 00:00.073
OS runtime libraries injection initialized successfully.
Information Injection 103558 00:00.121
Common injection library initialized successfully.
Information Injection 103558 00:00.132
OS runtime libraries injection initialized successfully.
Information Injection 103558 00:03.291
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
Information Injection 103558 00:03.413
CUDA injection initialized successfully.
Information Analysis 00:10.849
Profiling has stopped.
Information Daemon 00:16.733
Number of IP samples collected: 167,145.
Warning Daemon 00:16.733
The operating system throttled the collection of sampling data 6129 times.
Error Analysis 00:17.219
Event requestor failed: Source ID=
Type=ErrorInformation (18)
Properties:
OriginalSource (145)=EventRequestor
Error information:
TargetSideError (1100)
Properties:
ErrorText (100)=/build/agent/work/20a3cfcd1c25021d/QuadD/Target/quadd_d/quadd_d/jni/TimeConverter.cpp(709): Throw in function {anonymous}::CpuTimeDomain {anonymous}::GpuTicksConverter::ConvertToCpuTime(QuadDCommon::CudaDeviceId, uint64_t&) const
Dynamic exception type: boost::exception_detail::clone_impl
std::exception::what: NotFoundException
[QuadDCommon::tag_error_text*] = No GPU associated to the given GPU ID
ServiceName (200)=AnalysisService
MethodName (201)=GetData
NotFoundError (127)
Properties:
ErrorText (100)=No GPU associated to the given GPU ID
OriginalFile (140)=/build/agent/work/20a3cfcd1c25021d/QuadD/Target/quadd_d/quadd_d/jni/TimeConverter.cpp
OriginalLine (141)=709
OriginalFunction (142)={anonymous}::CpuTimeDomain {anonymous}::GpuTicksConverter::ConvertToCpuTime(QuadDCommon::CudaDeviceId, uint64_t&) const
OriginalExceptionClass (143)=N5boost16exception_detail10clone_implIN11QuadDCommon17NotFoundExceptionEEE
Warning Analysis 103558 00:26.038
CUDA profiling might have not been started correctly.
Warning Analysis 103558 00:26.038
No CUDA events collected. Does the process use CUDA?
Warning Analysis 103555 00:26.038
CUDA profiling might have not been started correctly.
Warning Analysis 103555 00:26.038
No CUDA events collected. Does the process use CUDA?
Warning Analysis 103558 00:26.038
Not all OS runtime libraries events might have been collected.
Information Analysis 103558 00:26.038
Number of OS runtime libraries events collected: 34,388.
Information Analysis 103555 00:26.038
Number of OS runtime libraries events collected: 1.
Error Analysis 00:26.038
Some events (3,053) were lost. Certain charts (including CPU utilization) on the timeline may display incorrect data. Try to decrease sampling rate and start a new profiling session.
The version is 2021.2.1
Also here is the summary of the hardware and versions i am using:
Platform
Linux
OS
CentOS Linux 7 (Core)
Hardware platform
x86_64
CPU description
Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz
GPU descriptions
Tesla PG500-216;Tesla PG500-216;Tesla PG500-216;Tesla PG500-216
Nsys GPU metric sampling has a minimum driver requirement of r460 and a minimum architecture version of Turing. It looks like you have the right driver but your system has Voltas.
What does " No GPU associated to the given GPU ID" error mean?
“Nsys GPU metric sampling has a minimum driver requirement of r460 and a minimum architecture version of Turing” so is there a way to figure out things like memory bandwidth utilization in situation where multiple kernels are running in parallel, on this hardware?
3.Is my setup incorrect?
Some more information.
I am using a bash wrapper because i need to pass some input. sh -c run_test_profiler_jacekt.sh
I am also setting LD_LIBRARY_PATH.
When i do : nsys profile -b dwarf --stats=true -f true -y 60 -w false -t cuda -d 30 -o $OUTPUT sh -c run_test_profiler_jacekt.sh
everything works fine, but over ssh target it does not.
The executable being run is java and kernels are executed from multiple threads, but not the main threads.
I has simillar problems with ncu but adding option --target-processes all resolved the problem there. But not in the nsight system.
GPU metrics are only sampled on one GPU. When trying to sample on your system, the machine is going to the given or default GPU, but then finding it is not a Turing or Ampere, it can’t hook onto the system to get metrics. This is not a good error message.
Nsight Systems can give you memory utilization in pre-Turing systems for CUDA programs, using GPU Memory trace:
or you can use --cuda-memory-usage option in the CLI. Note that this functionality can have serious performance effects. But this is usage, if you want bandwidth, I think you are going to have to turn to Nsight Compute.
Your setup looks good, except that you have Volta chips and you need Turings or Amperes for GPU Performance Metric Sampling to work.
BTW: it is extremely confusing that you are calling threads processes.
Nsight Systems can give you memory utilization in pre-Turing systems for CUDA programs, using GPU Memory trace
Ok, i will give it a go. Thanks!
But this is usage, if you want bandwidth, I think you are going to have to turn to Nsight Compute.
I am not sure what is the difference between usage and bandwidth. Does Nsight Compute allow to measure combined, actual bandwidth used by multiple kernels running concurrently?
BTW: i notice that my version only mentions -g to get the best call stacks and i am actually not getting them very good. I will try -fno-omit-frame-pointers and -funwind-tables
What version was your screenshot from? Looks more reasonable than what i am using:)
Hmm…you are using 2021.2.1 and those screenshots (which I snagged from the online docs) are also from 2021.2.1 (as it happens, I am the person that updated the docs)…so not sure why you would be seeing anything different.
Sorry, I’ve lost track of exactly which thing you were looking at. GPU metrics trace was added with 2021.2 (March this year). CUDA memory allocation graph went in in 2020.5 (Nov/December last year).
Figured out my glitch on Friday, and forgot to drop a note here, sorry.
The screen shot what is seen with SSH → Tegra, you are correct. Unfortunately collect all processes should be the default, and that is not what you are seeing (but only through the GUI).
@jacek.tomaka - Thanks for sharing the debug log. However the log seems to be missing some pieces of information that we expect. Could you try again? Make sure to:
Kill any running nsys process on target system before you save nvlog.config to $HOME
Open GUI and connect to target system only after you save nvlog.config to $HOME
Liuyi,
This is extremely embarrassing but after i have done what you suggest, i stopped seeing this error.
Now i can see GPUs in the timeline.
Sorry about wasting so much of your time but looks like the problem has gone away.
This is weird because i have killed nsys in the past…
If i see it again i will capture the log and send it.
Regards.
Jacek Tomaka
No problem, just make sure you no longer see this error even without debugging log enabled, because sometimes the problem can be hidden with logging turned on.