Next-Gen debugger fails to start

Tue Mar 13 23:52:51 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 391.01                 Driver Version: 391.01                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K620        WDDM  | 00000000:03:00.0  On |                  N/A |
| 36%   49C    P8     1W /  30W |    406MiB /  2048MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN V             TCC  | 00000000:04:00.0 Off |                  N/A |
| 29%   42C    P8    26W / 250W |      1MiB / 12160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1796    C+G   ...n 5.5\Monitor\Common\Nsight.Monitor.exe N/A      |
|    0      1980    C+G   ...6)\Google\Chrome\Application\chrome.exe N/A      |
|    0      2036    C+G   C:\Windows\system32\Dwm.exe                N/A      |
|    0      4028    C+G   ...ogram Files\Windows Sidebar\sidebar.exe N/A      |
|    0      6680    C+G   ...sual Studio 14.0\Common7\IDE\devenv.exe N/A      |
|    0      9884    C+G   ...sual Studio 14.0\Common7\IDE\devenv.exe N/A      |
+-----------------------------------------------------------------------------+

Thank you for your info, I thought TitanV is a GV100 chip, but looks like it’s GV102, we didn’t test GV102 before as it was released latter than nsight 5.5, I shall raise a bug to have a check.

Hmmm … I suspect you know better than me … but when I go to this website it says the TitanV has GV100 inside.

Sorry, my bad, I got something wrong from my document, could you try the following step?

  1. close your monitor because next-gen debugger doesn’t need it.
  2. close you visual studio.
  3. restart your visual studio with the env var CUDA_VISIBLE_DEVICES=0.
  4. then try to debug you app with next-gen debugger.

I guess your problem is caused by this.

Ok, here’s what I did.
I closed the monitor app.
I closed VS.
I opened a “Developer Command Prompt for VS2015”
In that DOS box I set CUDA_VISIBLE_DEVICES=0
I then proved the symbol was present with a value of 0.

I launched VS and set a BP on my kernel.
Start CUDA Debugging (Next-Gen)
I got the same error as before.

Then I tried this experiment.
I clicked on PROJECT, CONFIGURATION, DEBUGGING, ENVIRONMENT
I created an environment variable CUDA_VISIBLE_DEVICES=0
I ran the CUDA Next Gen debugger and got the same error.

I really have no idea, I’ve submitted a bug to dev to have a check as I only have quadro gv100 now.

Thanks for trying. :-) This TitanV has been a real pain in the a$$. The TitanX is a lot more reliable … of course the software for it is a bit more mature.

If you’re bored you could take a look at the CUPTI question that I posted in the CUDA Programming and Performance Forum.

–Bob

Hey harryz_
I woke up this morning and realized that doing set CUDA_VISIBLE_DEVICES=0 from a DOS box wont be seen by VS.

So, I went into the Advanced System Settings and inserted the CUDA_VISIBLE_DEVICES=0 into my environment from there.

Then I fired up VS, set a BP on MAIN and on my kernel.
I ran the Next Gen debugger and it worked! Kinda …

I hit the BP on main. WHen I pressed F5 something was running but I dont know what. That app is one of the CUPTI examples. It takes about 2 seconds to run. I had 1 CPU buried for over 5 minutes. I just left it.

So I am making progress … have you seen this eror? Remember, I have a TitanV and K620 and CUDA_VISIBLE_DEVICES=0.

–Bob

Actually yes, we already have a bug to track this scenario, but it only happens on few of our test machine and the dev cannot reproduce it, so they decided not to fix it in this version.

Please try to add a bp into cuda code and check if it can be hit.

Hi, I rebooted my pc and put a BP in the kernel.
I ran the next gen debugger.
I CPU went to 100%. The program didn’t crash … but it didn’t hit the BP either.

–Bob

Could you tell me which sample you use? I suggest that you can try some simple sample like vectorAdd or matrixMul in cuda samples to check if the debugger works well.

harryz
I should have guessed you would still be awake! Sure, I am using the sass_source_map
in the CUPTI examples.

Ok, I will also try the vectorAdd sample.

Hello

I compiled the vectorAdd in 64 bit debug mode. I set a BP in the kernel and it worked!
I compiled the matrixMul in 64 bit debug mode. I set a Bp in the host code. It worked. Then I set a BP in the CUDA code. it worked!

Then I went to the CUPTI sass_source_map folder. There is just a makefile and .cu file there.
I tried make and the OS said you dont have make.
I tried nmake and the OS said you dont have nmake. (you get the idea)

When I first started using that code I put it into a VS 2015 project and built it.
That is where I am having the problem. I will grab the code again, put it in a new project and try
again.

Thanks for asking me to do this. (I should have tried those 2 examples) Although I think that CUDA_VISIBLE_DEVICE = 0 was also important.

–Bob

harryz,

Interesting results.
I launched VS 2015 and created a new FILE/NEW/win32 console app.

In the new project I first set the build dependencies on CUDA 9.1.
I then renamed the .CPP file to a .CU file.
Then I just pasted the sass_source_map code into the CU file.

I did a 64 bit debug build.
I did DEBUG/START without DEBUGGING. It worked fine.
I did DEBUG/START DEBUGGING. It worked fine.

I set a BP in main. I did NSIGHT/Start CUDA debugging (nexgen). When I ran it hit the BP in main. When I did F5 it ran and never came back.

I erased the BP in main. I put a BP in the kernel. I did NSIGHT/Start CUDA debugging (nexgen). When I ran it never hit the BP in the kernel.

Maybe the DEV team should look at this? Can you reproduce the problem?

–Bob

Yeah, I can reproduce it, we never use debugger with cupti, here are what I got.

You can see the initTrace() and finiTrace() in source code, they starts and stops the cupti profiling.

  1. On legacy debugger, I cannot hit bp in cuda code if cupti is used, delete these two function then the legacy debugger can hit bp in cuda cude.
  2. This app won’t stop under Next-gen debugger no matter cupti is enabled or not.

Great!! I am not crazy! Can you tell someone about this??

Yeah, I’ve submitted a bug to track this, I suggest not to debug the cupti sample right now.