With version 2020.3, with older or newewst drivers for Quadro GPU’s, after profiling for 60 seconds, the NSight Systems hardware get’s stuck trying to load the newly generated profile.
Doing the same, same server, same drivers, etc, with 2020.2 version works, and does not take too much time to open the profile.
We are currently using 2020.2 version, since this one works quite smooth. It’s not an issue for us, since we can use 2020.2, but it would be cool that someone takes a look at the issue, before we are forced to use a newer version of NSIGHT Systems, to profile Ampere GPU’s.
Windows 10 Version 1607 Build 14393.3866
CPU: AMD EPYC 7401P
64 GB of DDR4 RAM
3 GPU’s all of them Quadro RTX4000
Sorry it took so long to get back to you. There was a regression in performance in the 2020.3 release for specific kinds of runs, that I think you ran into.
I suggest you download and install the 2020.4 which is available now via developer.nvidia.com
In the command line we simply call our software, a Windows executable.
When selecting Strat profiling manually, it does not generate a results file, it opens a window with this messages:
ApplicationExitedBeforeProfilingStarted (1104) {
RuntimeError (120) {
OriginalExceptionClass: class boost::exception_detail::clone_impl
OriginalFile: D:\TC\20a3cfcd1c25021d\QuadD\Host\Analysis\Clients\AnalysisHelper\AnalysisStatus.cpp
OriginalLine: 80
OriginalFunction: class Nvidia::QuadD::Analysis::Data::AnalysisStatusInfo __cdecl QuadDAnalysis::AnalysisHelper::AnalysisStatus::MakeFromErrorString(enum Nvidia::QuadD::Analysis::Data::AnalysisStatus,enum Nvidia::QuadD::Analysis::Data::AnalysisErrorType::Type,const class std::basic_string<char,struct std::char_traits,class std::allocator > &,const class boost::intrusive_ptr &)
ErrorText: The target application exited before profiling started.
}
}
When not, it generates a .qrep file, but it crashes before our software loads the interesting part to be profiled. How can I share the .qrep file?
The issue happens when our software starts allocating a lot of memory (both in GPU and CPU).
It does not crash when not using NSight, and it works with older NSight versions, though it does crash also. Simply, with the older versions we have enough time to get some profiling.
The System memory is 64GB or ram and we are using arroun 30GB, and the GPU’s are RTX 4000, we use 3 of them, and none of them get’s higher than 6GB of memory usage. That is, without NSight Systems.
And well, to be precise, since process != thread. Then no, we don’t use many processes, we have a single process, that creates many CPU threads. Several of this CPU threads ara doing CUDA work in the same or different GPU’s. 100% of the threads do work with only one GPU (unless there is some HtoD or DtoH or peer DtoD involved), since having threads changing cuda contexts in Windows is expensive.
I don’t see an option that says “process tree” in the interface any way… does it have another name?
We have an NVIDIA Driver crash reproducer, that reproduced a bug in NVIDIA drivers prior to 461.09. Now we are modifying it so that it can reproduce another crash we are facing with drivers starting on 461.09 (bug 3221330).
The reproducer does not reproduce the second crash yet, but it does reproduce the issue with NSight Systems 2021.2.1 mentioned in this post.
It’s a very simple code made of less than 200 code lines of C++, with a cmake, and in our case, compiled with Visual Studio 2017.
When trying to profile this code, to balance the load of the GPU’s, we find the same issue as with our complete software.
I played a bit with the static variables at the beginning of CUDAErrorReproducer.cpp, and if you lower the values enough, it may work, and capture something.
The values I left there are generating a compute usage of about 65%, and PCIe usage of about 32%, in a Quadro RTX 4000. With this, I’m already unable to profile with NSight Systems 2021.2.1, and driver 461.40.
Yes, it happens when trying to capture CUDA trace.
There is a bug number opened for this issue. Are you working with it? Bug number: 200746470
If not, could you synch, or ask the CUPTI team to synch with me and the people involved in this bug number? I have access to the bug through the partners portal, so we can communicate there.