cuda-gdb error

chopp · August 29, 2018, 8:49pm

I’ve been writing CUDA code for a while and have not used cuda-gdb recently. Sysadmin upgraded CUDA to 9.2 and I can’t seem to get cuda-gdb to work. I have multiple codes that work correctly, but no matter what, each time I run the code inside cuda-gdb I get the error message

Error: Failed to suspend device for CUDA device 0, error=CUDBG_ERROR_COMMUNICATION_FAILURE(0x1c)

This is on a machine that has two Tesla K20c cards. As for details, I am running it on the command-line from a bash shell on a Ubuntu Linux system. The error occurs as soon as I try to step into any kernel no matter how simple. Any hints how to resolve this would be appreciated.

Dave

felixng007 · September 26, 2018, 4:39pm

Have you solved this problem? I have the same issue. My hardware is Tesla K20Xm. My OS is Debian sid. I tried CUDA 9.0, 9.2 and 10.0, and all of them give the same results. My program can run but the cuda-gdb just give me the following error:
[New Thread 0x7fffec34e700 (LWP 18581)]
[New Thread 0x7fffebb4d700 (LWP 18582)]
Error: Failed to suspend device for CUDA device 0, error=CUDBG_ERROR_COMMUNICATION_FAILURE(0x1c).

chopp · September 26, 2018, 4:43pm

Sorry, no luck so far. I left it in the hands of my sysadmin and have moved on for now. He tried a few things but decided to wait and see if anyone replied to this request. As you can see, nothing has happened. May need to ping the NVIDIA people. If you do find a solution, please post.

chao.luo · December 12, 2018, 1:18am

I am having the same issue. I am uisng Debian 9 with a Quadro K620 (device 1) for display and a Titan Xp (devcie 0).
The error message I get from cuda-gdb is
“Error: Failed to suspend device for CUDA device 1, error=CUDBG_ERROR_COMMUNICATION_FAILURE(0x1c).”

When I switch to the text-based console/teminal (pressing ctrl+alt+f1), cuda-gdb works fine without the error.

I guess the error is because it cannot “suspend” the GPU being used for display. Or maybe I connect too many monitors to the GPU (3).

felixng007 · December 12, 2018, 2:07am

In fact, the GPU machine I used is a local cluster. I use ssh to connect to it, so I already use the terminal/console. My problem may be something else. Thank you all the same.

igzorn1980 · January 6, 2019, 2:09pm

Hello,

any news on this topic? I’ve the same issue on Oracle Linux 7.6 (Red Hat) with Cuda 10 and an Quadro 410 + GT 720. Even with this simple code:

#include <stdlib.h>

int main(void)
{
        int nDevices;
	cudaGetDeviceCount(&nDevices);

	return 0;
}

KR,
Iggi

chopp · January 6, 2019, 3:29pm

Still no answers.

felixng007 · January 6, 2019, 3:34pm

No news. I followed the exact official steps to install the CUDA. Is it possible that the hardware compatibility with the latest CUDA? Now I am using “printf” to debug.

attempt · February 24, 2019, 12:15am

Still nobody ? I have same issue with cuda 10:

Error: Failed to suspend device for CUDA device 0, error=CUDBG_ERROR_COMMUNICATION_FAILURE(0x1c).

error by Cuda Debugger API:
CUDBG_ERROR_COMMUNICATION_FAILURE - Communication error between the debugger and the application.

graphic card is not used for desktop output:

glxinfo|egrep “OpenGL vendor|OpenGL renderer”
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Ivybridge Desktop

Unfortunately i have ubuntu 18 and i can’t downgrade cuda to lower version when all worked fine.

holf · September 10, 2019, 4:35pm

Error: Failed to suspend device for CUDA device 0, error=CUDBG_ERROR_COMMUNICATION_FAILURE(0x1c).

I am experiencing the exact same on Ubuntu 18.04.3. Both with cuda 9 (the one coming as an Ubuntu package) and cuda 10.1 as installed via a .run installer.

I have both a GeForce GTX 660 Ti and a Tesla K20mX installed. The problem occurs both when using the Tesla and when running in console mode (without X blocking the card) on the GeForce.

This one really drives me nuts. I am desparately looking for a way to debug my code.

Robert_Crovella · September 10, 2019, 4:53pm

A few suggestions.

In a multi-GPU setup, try using CUDA_VISIBLE_DEVICES environment variable to restrict CUDA runtime visibility to the GPU that is actually being used for debug:

CUDA_VISIBLE_DEVICES=“1” cuda-gdb …

(for example. It may be necessary to specify “0” or some other number)

Update to the latest CUDA 10.1U2, i.e. cuda 10.1.243 (or whatever is later than that)
Update driver to the latest for your GPU, preferably 435.xx or later. [url]https://www.nvidia.com/download/driverResults.aspx/149785/en-us[/url]
don’t debug on a GPU that is being used for display of any kind

holf · September 11, 2019, 9:09am

Thanks a lot Robert, installing the latest driver (runfile) and cuda did the trick for me.

For everybody using Ubuntu 18.04.3, here’s what I did:

Remove all nvidia related ubuntu packages (assuming you had not used run file installers)
sudo apt purge “nvidia”
Removed all cuda related stuff (again assuming you had not used run file installers)
sudo apt purge “cuda”
Dowload the driver (beta) suggested by Robert from https://www.nvidia.com/download/driverResults.aspx/149785/en-us
Reboot (either into recovery mode (to prevent a display manager from being started) or shut down the running display manager after a normal boot
sudo systemctl stop sddm # kubuntu
sudo systemctl stop gdm # ubuntu

Then change to a graphical console e.g. by hitting Ctrl-Alt-F2 and login

Run the nvidia run-file installer for the nvidia graphics driver
Reboot and make sure the nvidia graphics driver is loaded (e.g. by using lsmod). If the nouveau driver is still loaded, try to blacklist it manually or run the nvidia driver installer again which hopefully will blacklist it for you.
Reboot again (and try to blacklist nouveau until the nvidia driver 435.xx) is actually loaded.
Download CUDA 10.1U2 from CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer and run the downloaded cuda runfile installer.

8.1) Not sure if a reboot is necessary after cuda installation.

Check if nvidia-smi works (which will not be the case, e.g. if the nouveau driver is still loaded)
Hopefully enjoy debugging with cuda-gdb

holf · September 11, 2019, 9:11am

Just for the record: The error I reported occured even when debugging without an X server running and even when I removed the second NVIDIA device from my machine (both devices are cuda capable). So it was unrelated to device visibility (CUDA_VISIBLE_DEVICES) which I also tried to switch between the two available cuda devices.

holf · September 12, 2019, 10:10am

Unfortunately it turns out that not all of my problems are solved. Now debugging with cuda-gdb works on the GeForce GTX 660 Ti but not on the Tesla K20Xm which hangs during any cuda api call.

Does the 435.xx beta driver support Tesla devices at all? At least they are not listed at https://www.nvidia.com/download/driverResults.aspx/149785/en-us

holf · September 12, 2019, 10:11am

To be more precise, running any cuda program hangs when run on the Tesla card no matter if run in the debugger or directly.

holf · September 12, 2019, 11:22am

After a restart, the Tesla apears to work normally again. Seems to be some kind of hickup. Will observe this more thoroughly.

Topic		Replies	Views
cuda-gdb hang and compiled program spewing nonsense CUDA Programming and Performance	7	2246	February 15, 2011
Linux CUDA kbuntu/ubuntu 11.10 CUDA Programming and Performance	13	101332	November 25, 2011
cuda-gdb hangs in the CUDA 2.3 beta CUDA Programming and Performance	0	1114	June 30, 2009
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53634	August 1, 2011
no CUDA-capable device is detected CUDA Setup and Installation	6	141643	February 9, 2018
cuda-dbg forces execution on device 0 cuda-gdb forces execution on gpu running X windows CUDA Programming and Performance	7	9683	June 16, 2009
VMD cannot detect CUDA properly CUDA Setup and Installation	10	6854	October 7, 2016
CUDA-gdb error 20 CUDA Programming and Performance	13	6217	November 9, 2010
newbie struggling to get cuda-gdb to run example is CUDA-GDB user manual Problem getting cuda-gdb to CUDA Programming and Performance	1	3891	November 1, 2011
CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	14	35116	December 9, 2016

cuda-gdb error

Related topics