no CUDA-capable device is detected

Hi,

I’m having a problem getting Linux to correctly detect my 1060 6GB card.

I keep getting the “no CUDA-capable device detected”. Strange thing is I have another 1060 I borrowed which works fine, it’s just mine I’m having a problem with. Here are some details:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.22                 Driver Version: 387.22                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  ERR!                Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   56C    P0    25W / 120W |      0MiB /  6072MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
-rwxr-xr-x 1 root root      26804 Nov 13 00:28 nvidia-bug-report.sh
-rwxr-xr-x 1 root root      68952 Nov 13 00:28 nvidia-cuda-mps-control
-rwxr-xr-x 1 root root      47832 Nov 13 00:28 nvidia-cuda-mps-server
-rwxr-xr-x 1 root root     225776 Nov 13 00:28 nvidia-debugdump
-rwxr-xr-x 1 root root     335720 Nov 13 00:28 nvidia-installer
-rwsr-xr-x 1 root root      29488 Nov 13 00:28 nvidia-modprobe
-rwxr-xr-x 1 root root      46608 Nov 13 00:28 nvidia-persistenced
-rwxr-xr-x 1 root root     294472 Nov 13 00:28 nvidia-settings
-rwxr-xr-x 1 root root     520320 Nov 13 00:28 nvidia-smi
lrwxrwxrwx 1 root root         16 Nov 13 00:29 nvidia-uninstall -> nvidia-installer
-rwxr-xr-x 1 root root     188000 Nov 13 00:28 nvidia-xconfig
crw-rw-rw-  1 root  root    195,   0 Jan 30 20:26 nvidia0
crw-rw-rw-  1 root  root    195, 255 Jan 30 20:26 nvidiactl
crw-rw-rw-  1 root  root    195, 254 Jan 30 20:27 nvidia-modeset
==============NVSMI LOG==============

Timestamp                           : Tue Jan 30 20:47:03 2018
Driver Version                      : 387.22

Attached GPUs                       : 1
GPU 00000000:01:00.0
    Product Name                    : Unknown Error
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-99b4fa80-2c34-2037-aa10-9d8499fea01a
    Minor Number                    : 0
    VBIOS Version                   : 86.06.66.00.3F
    MultiGPU Board                  : No
    Board ID                        : 0x100
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.01.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    PCI
        Bus                         : 0x01
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1C0610DE
        Bus Id                      : 00000000:01:00.0
        Sub System Id               : 0x862C1043
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 2
            Link Width
                Max                 : 16x
                Current             : 1x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : 2000 KB/s
        Rx Throughput               : 1000 KB/s
    Fan Speed                       : 0 %
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
    FB Memory Usage
        Total                       : 6072 MiB
        Used                        : 0 MiB
        Free                        : 6072 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 2 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 55 C
        GPU Shutdown Temp           : 102 C
        GPU Slowdown Temp           : 99 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 25.53 W
        Power Limit                 : 120.00 W
        Default Power Limit         : 120.00 W
        Enforced Power Limit        : 120.00 W
        Min Power Limit             : 60.00 W
        Max Power Limit             : 140.00 W
    Clocks
        Graphics                    : 367 MHz
        SM                          : 367 MHz
        Memory                      : 4006 MHz
        Video                       : 708 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 1974 MHz
        SM                          : 1974 MHz
        Memory                      : 4004 MHz
        Video                       : 1708 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None

Anyone have any ideas on a fix?

The output suggests this may not be a genuine GTX 1060:

Product Name                    : Unknown Error

        GPU Link Info
            PCIe Generation
                Max                 : 2

Note how the product name cannot be retrieved, which is very odd. Also, the GTX 1060 should have a PCIe gen3 interface. This list from NVIDIA confirms that device ID 0x1c06 belongs to a GTX 1060:

https://download.nvidia.com/XFree86/Linux-x86_64/390.25/README/supportedchips.html

GeForce GTX 1060 3GB 	1C02 	H
GeForce GTX 1060 6GB 	1C03 	H
GeForce GTX 1060 5GB 	1C04 	H
GeForce GTX 1060 6GB 	1C06 	H

So not sure what’s going on. Did you acquire this card new in the original box, or did you buy it second-hand / used? I wonder whether someone flashed some sort of modified VBIOS on this GPU, or whether it might be defective.

Why is this GPU plugged into a PCIe x1 slot? Try it in a x16 slot and see whether that helps.

Thanks for the info, card is brand new from official supplier. ASUS 1060 6GB Dual OC

I wondered if it’s a driver issue? But odd another 1060 card works ok on the same machine.

Given the anomalies in the nvidia-smi output, together with the fact that a different GTX 1060 works in the same system without making any other changes (controlled experiment) it seems quite clear that there is a problem with this particular GPU. Your driver seems fairly recent to me based on version number, but it shouldn’t hurt to install the latest (non-beta!) driver for a final check. You are running an officially supported version of Linux, I assume.

If this were my purchase, I would return it to the vendor, either in exchange for a different GPU or for a refund.

I think it’s because your driver is 387.22.
Your card device ID 0x1C60 is supported with 390.25 so how about update your driver to 390.25?

Trying the latest driver can’t hurt, but driver 387.22 shipped only three months ago while the GTX 1060 with 6 GB was introduced in the summer of 2016 (that is, 1.5 years ago). I could not find any release notes online that indicate that there are some models of GTX 1060 that require very recent drivers, nor recent reports detailing trouble getting GTX 1060 to work. What am I missing?

The device ID is shown by the nvidia-smi output above as 0x1C0610DE, where 0x10DE stands for NVIDIA and 0x1C06 for the GPU type. I guess the reference to 0x1C60 in #6 is a typo?

Thanks very much for the assistance guys, I’ll try updating the drivers tonight.

Curious if the driver update fixed your issue. We have serveral new GTX 1060’s with the 0x1c06 GPU type and the exact same problem you reported.

We’re working in Ubuntu and haven’t found a fix.

No, I didn’t manage to install the updated driver on the distro I am using. Just going to have to wait until the creator of it updates them in the future.