nvprof makes an application hang at exit after main

Hi,
Whenever I try to profile my application with nvprof, application hangs at exit after main() finishes.

Exact callstack attached below.

Any ideas what can be causing it ?
Thanks.

ntdll.dll!NtWaitForSingleObject()	Unknown
 	KernelBase.dll!WaitForSingleObjectEx()	Unknown
 	nvcuda.dll!00007ffb6de8ad60()	Unknown
 	nvcuda.dll!00007ffb6dd4716c()	Unknown
 	nvcuda.dll!00007ffb6df39199()	Unknown
 	nvcuda.dll!00007ffb6de21d27()	Unknown
 	nvcuda.dll!00007ffb6de221fd()	Unknown
 	nvcuda.dll!00007ffb6dfe1e93()	Unknown
 	cuinj64_101.dll!00007ffb7e73c8fb()	Unknown
 	cuinj64_101.dll!00007ffb7e73ca7d()	Unknown
 	cuinj64_101.dll!00007ffb7e73d082()	Unknown
 	cuinj64_101.dll!00007ffb7e757e3d()	Unknown
 	cuinj64_101.dll!00007ffb7e613785()	Unknown
 	cuinj64_101.dll!00007ffb7e613863()	Unknown
 	cuinj64_101.dll!00007ffb7e616554()	Unknown
 	cuinj64_101.dll!00007ffb7e611456()	Unknown
 	cuinj64_101.dll!00007ffb7e613601()	Unknown
 	nvcuda.dll!00007ffb6de62eff()	Unknown
 	nvcuda.dll!00007ffb6dd38033()	Unknown
 	nvcuda.dll!00007ffb6dec9499()	Unknown
 	cudart64_101.dll!00007ffba0df6865()	Unknown
 	cudart64_101.dll!00007ffba0df6e66()	Unknown
 	cudart64_101.dll!00007ffba0df7068()	Unknown
 	cudart64_101.dll!00007ffba0dee0bd()	Unknown
 	cudart64_101.dll!00007ffba0df3866()	Unknown
 	cudart64_101.dll!00007ffba0e1b10b()	Unknown
 	cudart64_101.dll!00007ffba0e1bf34()	Unknown
 	cudart64_101.dll!00007ffba0e1c0ff()	Unknown
 	ntdll.dll!LdrpCallInitRoutine()	Unknown
 	ntdll.dll!LdrShutdownProcess()	Unknown
 	ntdll.dll!RtlExitUserProcess()	Unknown
 	kernel32.dll!00007ffbb552d3ba()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown
>	myapp.exe!__scrt_common_main_seh() Line 295	C++
 	kernel32.dll!00007ffbb55281f4()	Unknown
 	ntdll.dll!RtlUserThreadStart()	Unknown

Hi sergeyn

Thanks for reporting the issue.

Would it be possible for you to share the sample code that causes hang? It’s really difficult to debug the issue without source.


Thanks,
Ramesh

Hi Ramesh,

You need the actual source ? Binary will not do ?

Thanks,
Sergey.

We will need source code. It is also fine if you write snippet that reproduces the hang.


Thanks,
Ramesh

stripping the codebase to a reprocase will take some time. I can’t send all the sources unfortunately.

I’ll see what I can do. Meanwhile nothing rings a bell based on the callstack ? I am able to reproduce same issue on another machine as well.

Thanks,
Sergey.

Hi Sergey,

I can’t think of anything from the stack. Is app running properly without using nvprof?


Thanks,
Ramesh

Yes, no errors or any other abnormalities without nvprof. Another observation is cudamemcheck, although finishes without any errors, but takes a really really looong time to complete.

Hi,

I have recreated the issue. I am on windows10 x64. download following link , build release configuration then try profiling it. Observe profiling never finishes, and after attaching with the debugger you should see the callstack similiar to the one I posted above (i.e. - hang after main)

[url]Dropbox - nvprof_issue.zip - Simplify your life

Note that it has something to do with the project setup. If you copy contents of kernels.cu file into a newly created cuda project, then it works.

Let me know if you are able to reproduce it or not.

Thanks,
Sergey.

Hi,
The project I posted above reproduces this problem on another completely different machine, just don’t forget to edit ‘Code Generation’ option to match your hardware exactly (and I didn’t mention I’m using cuda10.1).
Further digging into the link line reveals that if you swap input libraries from “cuda.lib;cudart.lib” to “cudart.lib;cuda.lib” then the problem magically goes away.

If I may, I’d like to post here other complaints about nvprof:

    • if you comment the last cuCtxSynchronize and cuMemUnregister, then nvprof displays a message box stating something like (not exact wording) “timeline of some kernel invocations/ memcopies are invalid, and those commands will not be shown on the timeline”. That’s a rather cryptic error message for a profane user to see. Note that cuda-memcheck doesn’t catch this either.
    • Console output doesn’t get displayed in nvprof until after application exits. That’s rather annoying given all the issues you get like the ones above. Imagine if your app doesn’t finish in time, you never know if your app is running, or is it hanging, or maybe it didn’t even start at all.

So, to summarize - 3 issues I hope you’ll shed some light on.

Thanks,
Sergey.

Unfortunately the library swap trick doesn’t help in my final application and this issue is a hard show-stopper for me.

Thanks,
Sergey.

Any updates on the problem ? were you able to reproduce it ?

I’ve created a bug report about this problem
https://developer.nvidia.com/nvidia_bug/2552865

But not much activity there either.

Regards,
Sergey.

Hi Sergey,

We could reproduce the issue. We will look into this and get back to you soon.


Thanks,
Ramesh

Ok, good, I hope you will fix it.

I was able to find the workaround, so this problem doesn’t block me anymore.

Regards,
Sergey.

Hi Sergey,

Good to know that you are not blocked. If you don’t mind can you share what kind of workaround you used?


Thanks,
Ramesh

Hi Ramesh.

The structure of my program is first the launcher pokes into driver api to figure out cuda availability, then cuda plugin gets loaded. Swapping making cudart.lib go first there didn’t work, but preloading cuda runtime dlls in the launcher program before nvcuda.dll did the trick.

Thanks,
Sergey.