Hello,
My OS is ubuntu 14.04(64bit), GPU is GTX 1080, when i use GPU tu run cuda program, i found GPU is loss, below i found some information, what’s problem this? and how to fix it?
~$ nvidia-smi
Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU
~$ sudo dmesg |tail -n 100
[ 13.886088] audit: type=1400 audit(1500106410.753:6): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/lib/NetworkManager/nm-dhcp-client.action” pid=554 comm=“apparmor_parser”
[ 13.886090] audit: type=1400 audit(1500106410.753:7): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/lib/connman/scripts/dhclient-script” pid=554 comm=“apparmor_parser”
[ 13.886309] audit: type=1400 audit(1500106410.753:8): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/lib/NetworkManager/nm-dhcp-client.action” pid=555 comm=“apparmor_parser”
[ 13.886312] audit: type=1400 audit(1500106410.753:9): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/lib/connman/scripts/dhclient-script” pid=555 comm=“apparmor_parser”
[ 13.886321] audit: type=1400 audit(1500106410.753:10): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/lib/NetworkManager/nm-dhcp-client.action” pid=554 comm=“apparmor_parser”
[ 13.886324] audit: type=1400 audit(1500106410.753:11): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/lib/connman/scripts/dhclient-script” pid=554 comm=“apparmor_parser”
[ 14.154335] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input10
[ 14.154895] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input11
[ 14.155470] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input12
[ 14.155513] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input13
[ 14.160505] init: Failed to obtain startpar-bridge instance: Unknown parameter: INSTANCE
[ 14.427308] media: Linux media interface: v0.10
[ 14.476952] FS-Cache: Loaded
[ 14.480916] RPC: Registered named UNIX socket transport module.
[ 14.480917] RPC: Registered udp transport module.
[ 14.480918] RPC: Registered tcp transport module.
[ 14.480919] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 14.485782] FS-Cache: Netfs ‘nfs’ registered for caching
[ 14.491110] Linux video capture interface: v2.00
[ 14.491504] usbcore: registered new interface driver snd-usb-audio
[ 14.491717] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
[ 14.497181] init: failsafe main process (706) killed by TERM signal
[ 14.861866] uvcvideo: Found UVC 1.00 device USB 2.0 PC Camera (058f:0362)
[ 14.864888] input: USB 2.0 PC Camera as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.0/input/input14
[ 14.864980] usbcore: registered new interface driver uvcvideo
[ 14.864981] USB Video Class driver (1.1.1)
[ 14.873898] Bluetooth: Core ver 2.19
[ 14.873909] NET: Registered protocol family 31
[ 14.873910] Bluetooth: HCI device and connection manager initialized
[ 14.873915] Bluetooth: HCI socket layer initialized
[ 14.873916] Bluetooth: L2CAP socket layer initialized
[ 14.873921] Bluetooth: SCO socket layer initialized
[ 14.875564] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 14.875565] Bluetooth: BNEP filters: protocol multicast
[ 14.875569] Bluetooth: BNEP socket layer initialized
[ 14.877592] Bluetooth: RFCOMM TTY layer initialized
[ 14.877596] Bluetooth: RFCOMM socket layer initialized
[ 14.877599] Bluetooth: RFCOMM ver 1.11
[ 15.045405] init: idmapd main process (791) terminated with status 1
[ 15.045412] init: idmapd main process ended, respawning
[ 15.314934] init: cups main process (848) killed by HUP signal
[ 15.314940] init: cups main process ended, respawning
[ 15.784897] nvidia: module license ‘NVIDIA’ taints kernel.
[ 15.784899] Disabling lock debugging due to kernel taint
[ 15.787643] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 15.790887] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=io+mem,decodes=none:owns=none
[ 15.790947] nvidia-nvlink: Nvlink Core is being initialized, major device number 249
[ 15.790958] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
[ 15.795639] [drm] Initialized drm 1.1.0 20060810
[ 15.799973] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 367.57 Mon Oct 3 20:32:57 PDT 2016
[ 15.804224] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 15.816677] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 248
[ 16.509959] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[ 16.545768] NFSD: starting 90-second grace period (net ffffffff81cd3700)
[ 16.705591] nvidia 0000:01:00.0: irq 44 for MSI/MSI-X
[ 16.706266] systemd-udevd[1183]: failed to execute ‘/bin/systemctl’ ‘/bin/systemctl start --no-block nvidia-persistenced.service’: No such file or directory
[ 17.171930] r8169 0000:03:00.0 eth0: link down
[ 17.171967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 17.238625] r8169 0000:03:00.0 eth0: link down
[ 17.407393] NVRM: Your system is not currently configured to drive a VGA console
[ 17.407395] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
[ 17.407396] NVRM: requires the use of a text-mode VGA console. Use of other console
[ 17.407397] NVRM: drivers including, but not limited to, vesafb, may result in
[ 17.407398] NVRM: corruption and stability problems, and is not supported.
[ 17.911045] systemd-udevd[1261]: failed to execute ‘/bin/systemctl’ ‘/bin/systemctl stop --no-block nvidia-persistenced’: No such file or directory
[ 18.065184] init: samba-ad-dc main process (1223) terminated with status 1
[ 18.298906] init: plymouth-splash main process (1304) terminated with status 1
[ 18.324146] init: nvidia-prime main process (1313) terminated with status 127
[ 19.278272] r8169 0000:03:00.0 eth0: link up
[ 19.278279] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 39.636192] nvidia 0000:01:00.0: irq 44 for MSI/MSI-X
[ 40.190429] nvidia-modeset: Allocated GPU:0 (GPU-1ff6e59a-3b28-4d4f-63ee-eaea8ed6a819) @ PCI:0000:01:00.0
[ 43.774744] cgroup: systemd-logind (905) created nested cgroup for controller “memory” which has incomplete hierarchy support. Nested cgroups may change behavior in the future.
[ 43.774746] cgroup: “memory” requires setting use_hierarchy to 1 on the root
[ 45.106907] audit_printk_skb: 171 callbacks suppressed
[ 45.106909] audit: type=1400 audit(1500106441.973:69): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/lib/cups/backend/cups-pdf” pid=1866 comm=“apparmor_parser”
[ 45.106914] audit: type=1400 audit(1500106441.973:70): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/sbin/cupsd” pid=1866 comm=“apparmor_parser”
[ 45.107141] audit: type=1400 audit(1500106441.973:71): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“/usr/sbin/cupsd” pid=1866 comm=“apparmor_parser”
[ 45.563825] aufs 3.x-rcN-20140707
[ 45.771065] Bridge firewalling registered
[ 45.772958] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[ 45.774476] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[ 45.876721] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 48.115249] audit: type=1400 audit(1500106444.979:72): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name=“docker-default” pid=1990 comm=“apparmor_parser”
[ 157.750181] uvcvideo: Failed to query (GET_DEF) UVC control 13 on unit 1: -32 (exp. 8).
[ 172.758761] systemd-hostnamed[3074]: Warning: nss-myhostname is not installed. Changing the local hostname might make it unresolveable. Please install nss-myhostname!
[ 371.842601] nvidia-modeset: Freed GPU:0 (GPU-1ff6e59a-3b28-4d4f-63ee-eaea8ed6a819) @ PCI:0000:01:00.0
[ 371.908003] nvidia 0000:01:00.0: irq 44 for MSI/MSI-X
[ 372.456433] nvidia-modeset: Allocated GPU:0 (GPU-1ff6e59a-3b28-4d4f-63ee-eaea8ed6a819) @ PCI:0000:01:00.0
[ 372.560799] sound hdaudioC1D0: HDMI: invalid ELD data byte 3
[ 388.402764] sogou-qimpanel[3887]: segfault at 0 ip 00007f38018fed3c sp 00007ffd18f95728 error 4 in libc-2.19.so[7f380187c000+1ba000]
[ 633.020031] NVRM: GPU at PCI:0000:01:00: GPU-1ff6e59a-3b28-4d4f-63ee-eaea8ed6a819
[ 633.020035] NVRM: GPU Board Serial Number:
[ 633.020037] NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
[ 633.020037]
[ 633.020039] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[ 633.020040] NVRM: GPU is on Board .
[ 633.020046] NVRM: A GPU crash dump has been created. If possible, please run
[ 633.020046] NVRM: nvidia-bug-report.sh as root to collect this data before
[ 633.020046] NVRM: the NVIDIA kernel module is unloaded.