GPU Error

Hi Nvidia Team we are getting Error below mention error when we run the Command nvidia-smi

nvidia-smi
Unable to determine the device handle for GPU 0000:B6:00.0: GPU is lost. Reboot the system to recover this GPU

Twice we have rebooted the GPU still we are facing the same issue .
nvidia-bug-report.log.gz (4.85 MB)
nvidia-bug-report.log (62.8 MB)

Please help us on fixing the issue

Perhaps your GPU is overheating, or has a hardware incompatibility with the machine it s plugged into, or the power delivery to the GPU is insufficient.

I tried downloading the log file you attached but was unable to extract it. The file seems to be corrupted.

HI i have Atttached the bug report once again ,please have a look into it and let us know what can be done for the existing issue

I’m able to read the log file now.

It appears you have not properly installed the GPU driver. You apparently used a runfile install method, but failed to remove the nouveau driver.

[    31.500] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[    31.500] (II) Module nvidia: vendor="NVIDIA Corporation"
[    31.500] 	compiled for 4.0.2, module version = 1.0.0
[    31.500] 	Module class: X.Org Video Driver
[    31.500] (II) LoadModule: "nouveau"
[    31.501] (II) Loading /usr/lib/xorg/modules/drivers/nouveau_drv.so
[    31.521] (II) Module nouveau: vendor="X.Org Foundation"
[    31.521] 	compiled for 1.19.5, module version = 1.0.15
[    31.521] 	Module class: X.Org Video Driver
[    31.521] 	ABI class: X.Org Video Driver, version 23.0
[    31.521] (II) LoadModule: "modesetting"
[    31.521] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[    31.521] (II) Module modesetting: vendor="X.Org Foundation"
[    31.521] 	compiled for 1.19.6, module version = 1.19.6
[    31.521] 	Module class: X.Org Video Driver
[    31.521] 	ABI class: X.Org Video Driver, version 23.0
[    31.521] (II) LoadModule: "fbdev"
[    31.522] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[    31.522] (II) Module fbdev: vendor="X.Org Foundation"
[    31.522] 	compiled for 1.19.3, module version = 0.4.4
[    31.522] 	Module class: X.Org Video Driver
[    31.522] 	ABI class: X.Org Video Driver, version 23.0
[    31.522] (II) LoadModule: "vesa"
[    31.522] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[    31.522] (II) Module vesa: vendor="X.Org Foundation"
[    31.522] 	compiled for 1.19.3, module version = 2.3.4
[    31.522] 	Module class: X.Org Video Driver
[    31.522] 	ABI class: X.Org Video Driver, version 23.0
[    31.522] (II) NVIDIA dlloader X Driver  418.43  Tue Feb 19 01:07:27 CST 2019
[    31.522] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[    31.522] (II) NOUVEAU driver Date:   Fri Apr 21 14:41:17 2017 -0400
[    31.522] (II) NOUVEAU driver for NVIDIA chipset families :
[    31.522] 	RIVA TNT        (NV04)
[    31.522] 	RIVA TNT2       (NV05)
[    31.522] 	GeForce 256     (NV10)
[    31.522] 	GeForce 2       (NV11, NV15)
[    31.522] 	GeForce 4MX     (NV17, NV18)
[    31.522] 	GeForce 3       (NV20)
[    31.522] 	GeForce 4Ti     (NV25, NV28)
[    31.522] 	GeForce FX      (NV3x)
[    31.522] 	GeForce 6       (NV4x)
[    31.522] 	GeForce 7       (G7x)
[    31.522] 	GeForce 8       (G8x)
[    31.522] 	GeForce GTX 200 (NVA0)
[    31.522] 	GeForce GTX 400 (NVC0)
[    31.522] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[    31.522] (II) FBDEV: driver for framebuffer: fbdev
[    31.522] (II) VESA: driver for VESA chipsets: vesa
[    31.522] (II) systemd-logind: releasing fd for 226:0
[    31.522] (II) Loading sub module "fb"
[    31.522] (II) LoadModule: "fb"
[    31.522] (II) Loading /usr/lib/xorg/modules/libfb.so
[    31.522] (II) Module fb: vendor="X.Org Foundation"
[    31.522] 	compiled for 1.19.6, module version = 1.0.0
[    31.523] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    31.523] (II) Loading sub module "wfb"
[    31.523] (II) LoadModule: "wfb"
[    31.523] (II) Loading /usr/lib/xorg/modules/libwfb.so
[    31.523] (II) Module wfb: vendor="X.Org Foundation"
[    31.523] 	compiled for 1.19.6, module version = 1.0.0
[    31.523] 	ABI class: X.Org ANSI C Emulation, version 0.4
[    31.523] (II) Loading sub module "ramdac"
[    31.523] (II) LoadModule: "ramdac"
[    31.523] (II) Module "ramdac" already built-in
[    31.523] (II) systemd-logind: releasing fd for 226:1
[    31.523] (EE) [drm] Failed to open DRM device for (null): -2
[    31.523] (WW) Falling back to old probe method for modesetting

Please read the entire linux install guide, and then follow the instructions in it carefully:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

You mean to say that nouveau driver is causing this failure, After disabling or blacklist nouveau driver will solve this issue.

From the document Robert Crovella pointed at:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-nouveau

The document then proceeds to describe how to eliminate the nouveau driver for each of the Linux distros supported by CUDA.

Are you sure if we disable or remove this nouveau driver, will solve my problem because this is not a testing server, This is live server we are running?
So now what can I do, Because already we have installed Nvidia driver with the latest version?
Tell me the procedure, We need to uninstall the nvida driver First or directly we can disable or blacklist the Nvidia driver.

Will that work ?