Can't use Nsight Systems 2020.3.1.72 (Windows) Can't load profilings

oamoros0ealf · August 31, 2020, 3:53pm

With version 2020.3, with older or newewst drivers for Quadro GPU’s, after profiling for 60 seconds, the NSight Systems hardware get’s stuck trying to load the newly generated profile.

Doing the same, same server, same drivers, etc, with 2020.2 version works, and does not take too much time to open the profile.

We are currently using 2020.2 version, since this one works quite smooth. It’s not an issue for us, since we can use 2020.2, but it would be cool that someone takes a look at the issue, before we are forced to use a newer version of NSIGHT Systems, to profile Ampere GPU’s.

Windows 10 Version 1607 Build 14393.3866
CPU: AMD EPYC 7401P
64 GB of DDR4 RAM
3 GPU’s all of them Quadro RTX4000

Thanks!

hwilper · October 7, 2020, 3:04pm

Sorry it took so long to get back to you. There was a regression in performance in the 2020.3 release for specific kinds of runs, that I think you ran into.

I suggest you download and install the 2020.4 which is available now via developer.nvidia.com

oamoros0ealf · June 9, 2021, 8:03am

Hi! We are having the same problems with 2021 versions.

Is any one else complaining?

hwilper · June 9, 2021, 2:18pm

Are you running via CLI or GUI?

Can you give me the command line/options you were using?

How bit is the results file (.qdrep).

oamoros0ealf · June 9, 2021, 5:12pm

We are using the GUI.

In the command line we simply call our software, a Windows executable.

When selecting Strat profiling manually, it does not generate a results file, it opens a window with this messages:

ApplicationExitedBeforeProfilingStarted (1104) {
RuntimeError (120) {
OriginalExceptionClass: class boost::exception_detail::clone_impl
OriginalFile: D:\TC\20a3cfcd1c25021d\QuadD\Host\Analysis\Clients\AnalysisHelper\AnalysisStatus.cpp
OriginalLine: 80
OriginalFunction: class Nvidia::QuadD::Analysis::Data::AnalysisStatusInfo __cdecl QuadDAnalysis::AnalysisHelper::AnalysisStatus::MakeFromErrorString(enum Nvidia::QuadD::Analysis::Data::AnalysisStatus,enum Nvidia::QuadD::Analysis::Data::AnalysisErrorType::Type,const class std::basic_string<char,struct std::char_traits,class std::allocator > &,const class boost::intrusive_ptr &)
ErrorText: The target application exited before profiling started.
}
}

When not, it generates a .qrep file, but it crashes before our software loads the interesting part to be profiled. How can I share the .qrep file?

The issue happens when our software starts allocating a lot of memory (both in GPU and CPU).

It does not crash when not using NSight, and it works with older NSight versions, though it does crash also. Simply, with the older versions we have enough time to get some profiling.

The System memory is 64GB or ram and we are using arroun 30GB, and the GPU’s are RTX 4000, we use 3 of them, and none of them get’s higher than 6GB of memory usage. That is, without NSight Systems.

hwilper · June 10, 2021, 2:57pm

How big is the .qdrep file?

oamoros0ealf · June 10, 2021, 3:16pm

It is 29.4MB

hwilper · June 10, 2021, 3:18pm

You should be able to zip it up and upload it here. Or if that is a problem, you can email it to me (hwilper@nvidia.com)

oamoros0ealf · June 14, 2021, 11:55am

Here you have:
Report 19.zip (22.6 MB)

hwilper · June 14, 2021, 3:28pm

Your application, does it spawn child processes to do the actual work? Do you have trace process tree turned on?

oamoros0ealf · June 14, 2021, 3:48pm

You can see the config we use in the picture.

And well, to be precise, since process != thread. Then no, we don’t use many processes, we have a single process, that creates many CPU threads. Several of this CPU threads ara doing CUDA work in the same or different GPU’s. 100% of the threads do work with only one GPU (unless there is some HtoD or DtoH or peer DtoD involved), since having threads changing cuda contexts in Windows is expensive.

I don’t see an option that says “process tree” in the interface any way… does it have another name?

oamoros0ealf · June 29, 2021, 2:25pm

Hi! Any news?

We have an NVIDIA Driver crash reproducer, that reproduced a bug in NVIDIA drivers prior to 461.09. Now we are modifying it so that it can reproduce another crash we are facing with drivers starting on 461.09 (bug 3221330).

The reproducer does not reproduce the second crash yet, but it does reproduce the issue with NSight Systems 2021.2.1 mentioned in this post.

It’s a very simple code made of less than 200 code lines of C++, with a cmake, and in our case, compiled with Visual Studio 2017.

When trying to profile this code, to balance the load of the GPU’s, we find the same issue as with our complete software.

Do you want me to share this code with you?

Thanks!

hwilper · June 29, 2021, 6:52pm

Yes, please share this with us.

@liuyis, can you take a look at this?

oamoros0ealf · June 29, 2021, 9:05pm

Here you have it.

It is 4 files at the end, 2 for a dummy kernel, 1 for the logger system, and the actual main cpp file.

NSightSystems2021.2.1IssuesReproducer.zip (6.3 KB)

liuyis · June 29, 2021, 9:18pm

Thanks for sharing, I’ll take a look.

oamoros0ealf · June 29, 2021, 11:00pm

I played a bit with the static variables at the beginning of CUDAErrorReproducer.cpp, and if you lower the values enough, it may work, and capture something.

The values I left there are generating a compute usage of about 65%, and PCIe usage of about 32%, in a Quadro RTX 4000. With this, I’m already unable to profile with NSight Systems 2021.2.1, and driver 461.40.

Thanks!

oamoros0ealf · June 29, 2021, 11:35pm

Same issue with:

Windows 10 Enterprise 21H1
Driver 471.11
NSightSystems 2021.2.4

A single NVIDIA RTX A4000

liuyis · June 30, 2021, 4:20pm

I can reproduce the crash successfully with the reproducer you shared. Just to confirm, this crash only happens when CUDA trace is enabled, right?

An initial investigation indicates the crash happens in CUPTI library (which we relies on to get CUDA trace data). I will follow up with CUPTI team.

oamoros0ealf · June 30, 2021, 5:25pm

Yes, it happens when trying to capture CUDA trace.

There is a bug number opened for this issue. Are you working with it? Bug number: 200746470

If not, could you synch, or ask the CUPTI team to synch with me and the people involved in this bug number? I have access to the bug through the partners portal, so we can communicate there.

Thanks!

Topic		Replies	Views
Latest Nsight Systems and Nvidia Driver aren't compatible? Profiling x86 Windows Targets	21	3587	March 4, 2021
Nsight Systems fails to start profiling on Win 10 x64 with runtime target side error Profiling x86 Windows Targets	10	2791	May 16, 2023
Nsight system 2023.1.2 Error when profilining Profiling x86 Windows Targets	7	1513	August 18, 2023
Windows 10 error with Nsight: ==WARNING== No kernels were profiled Nsight Compute	3	747	February 22, 2023
Option to profile only master process Nsight Compute cuda	23	3160	December 1, 2023
Failed to start profiling session when trying to profile executable Profiling x86 Windows Targets	5	3020	October 31, 2019
Nsight System doesn't create qdrep file Profiling Linux Targets	9	67	September 23, 2024
NSIGHT Visual Studio Edition not working with Quadro RTX 5000 Profiling x86 Windows Targets	7	1026	February 8, 2019
Unable to capture "Can't find UUID for CUDA device" Profiling Linux Targets	10	2286	November 9, 2023
Nsight system HPC Linux installation nvc, nvc++ and nvfortran	7	1716	August 31, 2021

Can't use Nsight Systems 2020.3.1.72 (Windows) Can't load profilings

Related topics