Cuda error 30 (unknown error) after suspend

AliAliev · July 19, 2017, 8:37am

The problem is any Cuda call returns with error code 30 (unknown error). This happens after suspend/wake and fixes after reboot. I tried driver versions 375.66 and 384.47, but the problem persists.

<b>$ uname -a</b>
Linux WS1005 4.8.0-58-generic #63~16.04.1-Ubuntu SMP Mon Jun 26 18:08:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

<b>$nvidia-smi</b>
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.47                 Driver Version: 384.47                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
| 28%   31C    P8     8W / 151W |    530MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      3424    G   /usr/lib/xorg/Xorg                             354MiB |
|    0      3987    G   compiz                                          93MiB |
|    0      4777    G   ...el-token=1C9EB7F783F4F988F1752CC22A98C44A    79MiB |
+-----------------------------------------------------------------------------+

<b>$deviceQuery</b>
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

<b>$python</b>
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
>>> caffe.set_device(0)
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0719 11:13:07.302003  5523 common.cpp:151] Check failed: error == cudaSuccess (30 vs. 0)  unknown error
*** Check failure stack trace: ***

AliAliev · July 19, 2017, 9:11am

I also want to mention that the problem is not always reproducible after a suspend/wake, but if you keep trying you “succeed”.

I noticed suspicious lines in dmesg right after the wake when Cuda starts to fail:

AliAliev · August 4, 2017, 9:05am

This is still actual

Robert_Crovella · August 4, 2017, 1:52pm

The latest linux driver:

[url]Linux x64 (AMD64/EM64T) Display Driver | 384.59 | Linux 64-bit | NVIDIA

mentions a fix for a suspend issue. It may be worth a try.

AliAliev · August 26, 2017, 3:46pm

Hi txbob,

updating to this driver version didn’t help.

tera · August 26, 2017, 3:57pm

Do you really need to reboot or is it enough to reload the driver? The latter is the case for me with a GTX1050 under Ubuntu 16.04.

AliAliev · August 26, 2017, 3:59pm

Hi tera,

could you explain how you reload the driver?

tera · August 26, 2017, 5:04pm

Stop all programs using the driver (particularly X11).
“lsmod | grep nvidia” and rmmod the modules with zero use count.
Repeat until no Nvidia kernel modules are loaded.
“modprobe nvidia” to reload the driver.
Restart X11 or whatever was using the GPU.

Once you know the order in which to unload modules, you can also package the whole process into a script.

AliAliev · August 26, 2017, 6:43pm

I would like to avoid restarting X11, because it would close all my window applications, including terminals…

tera · August 26, 2017, 7:50pm

I am running X on the integrated Intel GPU, so it doesn’t need to be restarted. I am reserving the discrete GPU entirely for CUDA. Your priorities may be different, of course.

nocnokneo · June 10, 2019, 12:36pm

Still an issue with Driver Version: 418.67. Any status updates on this bug? Restarting X11 is just as painful as rebooting the whole machine.

sin2lee · June 19, 2019, 3:33am

Hi nocnokneo

If you have an integrated Intel GPU and X11 runs on it.You may try the solution that tera recommended

Topic		Replies	Views
cuda (375.66) is failing with uknown error 30 after suspending Ubuntu 16.04 Linux	3	1675	September 5, 2017
Linux installation error: cudaGetDeviceCount returned 30 -> unknown error CUDA Setup and Installation	9	19536	November 4, 2021
Making sure all previous versions of CUDA are gone (Drivers randomly fail on reboot) Linux cuda	1	703	January 10, 2021
CUDA Error when starting machine post suspension Linux	7	3457	April 16, 2021
Cuda error 30 (Unknown error) after millions of kernel launches CUDA Programming and Performance	8	14664	November 9, 2011
Failed call to cuInit CUDA_ERROR_NOT_INITIALIZED (Device mapping: no known devices) CUDA Setup and Installation	7	6438	November 27, 2018
Cuda driver 320.00 error CUDA Programming and Performance	1	1756	September 5, 2013
Cuda Error #4 that requires PC Reboot, Help!!! CUDA Programming and Performance	17	9594	September 17, 2013
CUDA Error: CUDA driver version is insufficient for CUDA runtime version on Windows 2016 Server after reboot CUDA Setup and Installation	2	791	April 30, 2018
Every CUDA API call returns "unknown error" after installing CUDA 12.5 on Debian CUDA Setup and Installation	0	84	July 15, 2024

Cuda error 30 (unknown error) after suspend

Related topics