I have been trying out the NSight Visual Studio Edition and I must say I am really disappointed. Not up to normal NVIDIA standards.
I think we really need to make it clear when we Present streams as a best practice and talk about pitfalls that we inform people ALL DEBUGGING IS DISABLED when you launch with a stream parameter! Even the darn printf’s no longer work.
Crashing the OS a lot. ( DO NOT start Nsight Systems if you are RDP’ed in !!!)
Finding something from google that leads to this forum is a real problem that seconds after showing you a post it snaps to the search page.
There is also ons of old, incorrect info, very missleading. NSight monitor is installed, mentioned in the tutorials but it NEVER shows any connections. I think we need to start segmenting the documentation into versions and be really clear about what works on what versions in what IDE’s and OS’s
I didn’t have any trouble like that. I don’t think your claims are correct, generally (specifically: you should be able to debug with streams, and it should be possible to use printf from kernel code, generally, with or without streams).
I have VS2019 on a GeForce GTX 2070, windows 10, CUDA 10.1.243
I set the startup project to the cuda sample code simpleStreams. I built the debug version of the project. Then I set a breakpoint on the first line of CUDA kernel code. Then I selected Extensions…Nsight…Start CUDA Debugging(Next Gen).
I didn’t have any trouble stopping at that breakpoint. Here’s what my window looks like at that point:
(As a bonus, you’ll note I’m connected over Remote Desktop)
The fact that even printf is not working for you suggests to me that you have a CUDA runtime error in your program. Make sure you don’t have such an error (e.g. kernel launch error). For example, if you haven’t properly created the stream, the CUDA kernel will not launch into that stream. I really have no idea what the problem is, but if you have a problem like that, you won’t be able to debug the kernel (because it is not launching at all) and printf won’t work either.
I have a similar setup with a RTX 2080ti i tried vs2017 and vs2019 on the vs2015 sln.
I cannot see your launch. I have no trouble if launch is on default stream.
ie
init_array<<<blocks,threads>>> is fine
init_array<blocks,threads,shm,stream> where stream is not 0 breaks all debugging
Why is this forum so messed up, I could not get back to this post to reply from the link in the notification and ther was no my posts, I have to search. WTH?
I also cannot see the image in a larger view in this forum and have to go back to the email to see detail.
after posting i found a stackoverflow mention from older CUDA that stream 0 is special and it tries to join everything which can defeat all parallalism of code trying to use streams, maybe that is still true (but obscured) and has a hand in this.
It’s a CUDA sample code. Perhaps you should learn what the CUDA sample codes are and study the one I indicated. Maybe even try my experiment to see if you see different behavior.
I’m not suggesting that, and the sample code I suggested does not do that, and I’ve already suggested one possible explanation – error in your code – that could lead to that observation.
Perhaps you should try the experiment I suggested of adding rigorous CUDA error checking to your code, and study the error output if any.
I don’t know how to do that either, but in chrome at least, any web page can be zoomed in using Ctrl-+
I am familiar with and have built and debugged lots of samples. It was by comparing samples to my branch that I was able to discover this issue. That example you posted calls that kernel in 3 ways. I will comment out the other calls to see if your breakpoint being hit is from default or the stream call.
I have a lot of error checking and it does not indicate error. AND the profile shows the kernel IS called.
I will check later if mixing of default with streams causes the symptom I am seeing.
On that sizeing. Yeah I know I can enlarge in the browser, (when I am on desktop). But I was on my phone.
Still does not excuse this forum’s inability to got to a linked response either from an email or a google search. :-(
Thanks for checking that. After work I will make sure I do not have a mix of default and specified streams. I am converting a frame NV12 → BGRA and then starting 6 streams that scale a frame then decimate into 1024 groups of 12x12 pixels to feed inference engines in parallel.
I did have an event to measure time to convert and process the frame and that event is not destroyed until all the spawned threads returned so that may be the issue.
I have to wait unitl after work or the weekend to play with it again.