NVRM: Failed to copy vbios to system memory

Ok, I seem to have solved it but am I’m confused by my solution or rather the origin of why this happened.

I wanted to take a look into the ROM and used Kepler BIOS Tweaker by TechPowerUp (Kepler BIOS Tweaker (v1.27) Download | TechPowerUp). It showed my there that the checksum is incorrect. I thought that the vBIOS just got corrupted (which is a weird thing to happen while running the card).

With NVFlash, also by TechPower Up (NVIDIA NVFlash (5.735.0) Download | TechPowerUp), I save the ROM from another Workstation which had the same TITAN Black GPU and was working:

# ./nvflash_linux --save GK110_TITAN_Black_Working.rom
NVIDIA Firmware Update Utility (Version 5.414.0)
Simplified Version For OEM Only
IFR Data Size         : 1284 bytes
IFR CRC32             : B7F5E08B
IFR Image Size        : 1536 bytes
IFR Image CRC32       : 25838C8E
IFR Subsystem ID      : 3842-3790
Image Size            : 236544 bytes
Version               : 80.80.4E.00.90
~CRC32                : 1DE27A5C
Image Hash            : A80A196C59A8850E3154C89DC320A23C
OEM String            : NVIDIA
Vendor Name           : NVIDIA Corporation
Product Name          : GK110B Board - 20830031
Product Revision      : Chip Rev
Device Name(s)        : GeForce GTX TITAN Black
Board ID              : E618
PCI ID                : 10DE-100C
Subsystem ID          : 3842-3790
Hierarchy ID          : Normal Board
Chip SKU              : 430-0
Project               : 2083-0031
CDP                   : N/A
Build Date            : 02/07/14
Modification Date     : 02/13/14
UEFI Support          : Yes
UEFI Version          : 0x1002A (Jan 20 2014 @ 17684658 )
UEFI Variant Id       : 0x0000000000000004 ( GK1xx )
UEFI Signer(s)        : Microsoft Corporation UEFI CA 2011
InfoROM Version       : 2083.0031.00.03
InfoROM Backup Exist  : NO
License Placeholder   : Absent
GPU Mode              : N/A
Sign-On Message       : GK110B P2083 SKU 31 VGA BIOS

Then I saved the ROM from the defective GPU and then flashed it with that working ROM (I did this because my gurantee expired, so what do I have to lose anyway):

# ./nvflash_linux --save GK110_TITAN_Black_Corrupted.rom 
NVIDIA Firmware Update Utility (Version 5.414.0)
Simplified Version For OEM Only
IFR Data Size         : 1284 bytes
IFR CRC32             : B7F5E08B
IFR Image Size        : 1536 bytes
IFR Image CRC32       : 25838C8E
IFR Subsystem ID      : 3842-3790
Image Size            : 236544 bytes
Version               : 80.80.4E.00.90
~CRC32                : E708258A
Image Hash            : A80A196C59A8850E3154C89DC320A23C
OEM String            : NVIDIA
Vendor Name           : NVIDIA Corporation
Product Name          : GK110B Board - 20830031
Product Revision      : Chip Rev
Device Name(s)        : GeForce GTX TITAN Black
Board ID              : E618
PCI ID                : 10DE-100C
Subsystem ID          : 3842-3790
Hierarchy ID          : Normal Board
Chip SKU              : 430-0
Project               : 2083-0031
CDP                   : N/A
Build Date            : 02/07/14
Modification Date     : 02/13/14
UEFI Support          : Yes
UEFI Version          : 0x1002A (Jan 20 2014 @ 17684658 )
UEFI Variant Id       : 0x0000000000000004 ( GK1xx )
UEFI Signer(s)        : Microsoft Corporation UEFI CA 2011
InfoROM Version       : 2083.0031.00.03
InfoROM Backup Exist  : NO
License Placeholder   : Absent
GPU Mode              : N/A
Sign-On Message       : GK110B P2083 SKU 31 VGA BIOS

# ./nvflash_linux GK110_TITAN_Black_Working.rom

And saved the vBIOS again from that GPU, to verify that it worked

# ./nvflash --save GK110_TITAN_Black_Test.rom
NVIDIA Firmware Update Utility (Version 5.414.0)
Simplified Version For OEM Only
IFR Data Size         : 1284 bytes
IFR CRC32             : B7F5E08B
IFR Image Size        : 1536 bytes
IFR Image CRC32       : 25838C8E
IFR Subsystem ID      : 3842-3790
Image Size            : 236544 bytes
Version               : 80.80.4E.00.90
~CRC32                : 1BC7DEB8
Image Hash            : A80A196C59A8850E3154C89DC320A23C
OEM String            : NVIDIA
Vendor Name           : NVIDIA Corporation
Product Name          : GK110B Board - 20830031
Product Revision      : Chip Rev
Device Name(s)        : GeForce GTX TITAN Black
Board ID              : E618
PCI ID                : 10DE-100C
Subsystem ID          : 3842-3790
Hierarchy ID          : Normal Board
Chip SKU              : 430-0
Project               : 2083-0031
CDP                   : N/A
Build Date            : 02/07/14
Modification Date     : 02/13/14
UEFI Support          : Yes
UEFI Version          : 0x1002A (Jan 20 2014 @ 17684658 )
UEFI Variant Id       : 0x0000000000000004 ( GK1xx )
UEFI Signer(s)        : Microsoft Corporation UEFI CA 2011
InfoROM Version       : 2083.0031.00.03
InfoROM Backup Exist  : NO
License Placeholder   : Absent
GPU Mode              : N/A
Sign-On Message       : GK110B P2083 SKU 31 VGA BIOS

After a reboot and Installtion of the current NVidia drivers (they had to be purged before using nvflash) the GPU reported no errors and was listed under nvidia-smi again. A stress test with gpu_burn (GitHub - Microway/gpu-burn: Microway's improved version of GPU Burn) worked without any errors for an hour and showed the same GFLOP/s as the working TITAN Black. I’m a bit sceptic here but I might have fixed it. I’ll run some more hands on tests and see if it worked. ‘nvidia-debugdump’ is working again (I can upload the results if desired) and ‘dmesg | grep -i NVRM’ throws:

$ dmesg | grep -i NVRM
[  156.930465] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.26  Mon Apr 30 18:01:39 PDT 2018 (using threaded interrupts)

Something which is still not working correctly is:

$ cat /proc/driver/nvidia/gpus/0000\:01\:00.0/information
Model:           GeForce GTX TITAN Black
IRQ:             50
GPU UUID:        GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:        PCIe
DMA Size:        40 bits
DMA Mask:        0xffffffffff
Bus Location:    0000:01:00.0
Device Minor:    0

I’m really confused.