I was planning to perform a clean install of DGX OS 6.3.2 on our V100 system. After fully wiping the four SSDs in the JBOD, I tried booting into the BIOS to select my bootable USB drive for installation. For some reason, I’m now unable to boot into the BIOS with a motherboard post code of A2 (Asus X99-E-10G WS) as neither the F2 nor DEL keys seem to register:
Only solution was via BIOS USB Flashback of official ASUS BIOS:
Download the latest BIOS (1201) from Asus’ Website. Note: You cannot reflash the official NVIDIA BIOS via the motherboard’s USB Flashback functionality. Installation is only possible through the BIOS’ built-in update UI.
Rename the downloaded BIOS capsule file to X99E10G.CAP and copy it to a FAT32 formatted USB2 flash drive.
Take out the bottom two V100 GPUs to reach the “Clear CMOS” (red) and “BIOS Flashback” buttons (below RESET on the right hand side).
Press the “Clear CMOS” button while the system is off but connected to AC power. Make sure the system remains turned off for the next step.
Insert your USB2 flash drive with the BIOS capsule file into the green USB port at the back.
Press and hold the BIOS flashback button on the motherboard until the BIOS flashback LED next to it starts blinking. The ASUS BIOS is now being flashed. Wait until the LED goes out. If the LED stays illuminated indefinitely, the BIOS flashback is not working (check file, file name, USB stick and port).
Remove the flash drive and power on the system. Be patient, the first POST takes quite a while after flashback.
If you do not need support for the NVLink PCIe bridge and are fine with video output coming from the first instead of the last GPU, just keep the official Asus BIOS. After POST, boot into Setup (F1), load optimized defaults, and ensure “Above 4G Decoding” is enabled while “Fast Boot” is disabled (both to avoid HW detection glitches during boot). Exit and save changes.
You can now (optionally) reflash the NVIDIA BIOS by using the BIOS’ built-in update mechanism. For details, follow NVIDIA’s official documentation for the DGX Station V100 system.
The procedure described above prevents the boot sequence from getting stuck at the splash screen with an A2 post code. After OS installation, you might notice that one of your four Tesla V100 GPUs induces a different boot loop, this time with motherboard post code 6A (System Agent DXE SMM initialization is started). In our case it was the third GPU in Slot #5. To fix this issue, try the following:
Install the NVIDIA BIOS via the Update BIOS functionality of your currently running ASUS BIOS. Wait for the installation to finish and the motherboard to initialize.
Disconnect everything from the rear IO ports except a mouse and keyboard.
Disconnect all PCIe power cables from the GPUs and remove them from their PCIe sockets.
Press the “Clear CMOS” button on the motherboard. The motherboard will restart automatically.
Wait until the motherboard has completed initialization.
Turn off the system.
Install the third GPU into PCIe slot #5 and connect the DisplayPort cable to the right port (as seen from the rear).
During POST, a long CSM message should appear: “The VGA card is not supported by UEFI driver. CSM (Compatibility Support Module) settings have been changed for better compatibility.”
Boot into the BIOS once with F1.
Do not change any BIOS settings, and do not perform “Load Optimized Defaults”
Turn off the system but leave the power supply connected and its main switch on.
Install the second GPU, boot, turn off the system.
Install the fourth (last) GPU, boot, turn off the system.
Install the first GPU, boot, turn off the system.
Turn on the system, boot into the BIOS, and change the following settings if needed:
Intel Virtualization Technology: Enabled
Intel VT for Directed I/O (VT-d): Enabled
Restore AC Power Loss: Power On
Save the BIOS changes, restart the system, and boot into the OS.
Verify that all four GPUs are recognized by the OS.