Alright, so here’s a quick tutorial on I modified my GTX480 firmware (PCI Expansion ROM). Please understand that this is UNSUPPORTED, UNTESTED and MAY VOID YOUR WARRANTY, so proceed AT YOUR OWN RISK. Changing your firmware (and especially softstraps) can potentially render your card useless where you may have to resort to hardware modifications.
Note that this tutorial assumes that you have a dual card setup, like me, so that you don’t lose your graphics functionality (TCC mode) and you can easily recover from a broken firmware by using the primary card.
A short rundown of my own workstation:
Operating System: Windows 7 Professional, 64-bit
Driver version: 270.32
CPU: Intel i7 920 @ 2.67GHz
Bus: PCI Express x16 Gen2
Primary card: Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0
Secondary card: Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0
Recommended firmware modification tools (you’re advised to check out the documentation of each of these):
In short, the goal of this firmware modification is to change the PCI Device ID of the card so it is detected as a Tesla series by the NVIDIA driver, enabling additional functionality that’s otherwise disabled. Specifically, I want to change my Device ID from 06C0 (GeForce GTX 480) into 06D1 (Tesla C2050). Coincidentally, I have a HP C2050 firmware (version 70.00.2B.00.0E) lying around to do some comparisons.
Let’s query the devices:
> nvflash -a
NVIDIA Firmware Update Utility (Version 5.95)
NVIDIA display adapters present in system:
<0> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00
<1> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00
The card I’m interested in is the one hanging on bus id 3 or index 1, so let’s save the firmware to a file called firmware.rom:
> nvflash --index=1 -b firmware.rom
NVIDIA Firmware Update Utility (Version 5.95)
Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00
The display may go *BLANK* on and off for up to 10 seconds during access to the
EEPROM depending on your display adapter and output device.
Identifying EEPROM...
EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page
Reading adapter firmware image...
Image Size : 62464 bytes
Version : 70.00.21.00.02
~CRC32 : D040F75A
Subsystem ID : 10DE-075F
Hierarchy ID : Normal Board
Chip SKU : 375-0
Project : 1022-0000
CDP : N/A
Build Date : 04/14/10
Modification Date : 04/14/10
Saving of image completed.
Open up the firmware with your hex editor, as we will be changing the following:
[*]Softstraps. The firmware contains a mechanism called softstraps, explained below, which allows the firmware to override certain chip settings (hardstraps) including the PCI Device ID. Manipulation of softstraps is done by using a combination of two sets of 32-bit AND + OR masks. Nvflash has an option to change the straps, which we will be using, though we will first need to read out the original straps from the firmware:
AND mask 0 location: 00000058, little endian
OR mask 0 location: 0000005C, little endian
AND mask 1 location: 00000060, little endian
OR mask 1 location: 00000064, little endian
(Additionally, 00000068 and 0000006C contain the checksums for the softstraps.)
[*]Regular PCI Device ID, location: 0000018E, little endian
[*]For the sake of authenticity, the board ID/boot strings and firmware versions will also be modified.
Board boot string location: 00000086
Board ID string location: 00000122
Firmware version location: 00000238, little endian
I’ll start by modifying the firmware’s PCI Device ID value at 0000018E, located right after the PCI Vendor ID for NVIDIA (0x10DE) within the PCI block:
Little endian!
00000180: 91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 C0 06
(NEW)
00000180: 91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 D1 06
Next up, I’ll change the board boot string at 00000086 and board ID string at 00000122 into their C2050 counterparts:
00000086: GF100 P1022 SKU 0000 VGA BIOS
00000122: GF100 Board - 10220000
(NEW)
00000086: GF100 P1030 SKU 0200 VGA BIOS
00000122: GF100 Board - 10300200
Then, the firmware version:
Little endian!
00000230: 00 00 00 00 00 00 00 00 00 21 00 70 02 00 00 00
(NEW)
00000230: 00 00 00 00 00 00 00 00 00 2B 00 70 0E 00 00 00
So far for the firmware image modifications, so be sure to save your modified firmware. The checksum of the modified firmware image still needs to be recalculated. You can do this by opening your modified firmware in NiBiTor (ignore the warnings about unknown device IDs) and save the firmware to another file. The “Integrity” icon should be green in your final firmware file. (The checksum can be found in the Adv. Info tab, if you’re interested.)
We’re now ready to flash the modified firmware onto the secondary card. I’ll be changing the softstraps later on by using nvflash separately, which makes it a lot easier. Let’s use nvflash with a couple of options to make it clear that we totally want it to override all kinds of settings we really shouldn’t be overriding and flash the modified firmware to the card. We’ll also perform a full erase of the EEPROM first just to be sure:
> nvflash --index=1 --eraseeeprom
> nvflash --index=1 --overridesub --overrideboard --auto --noconfirm -5 -6 firmware-new.rom
NVIDIA Firmware Update Utility (Version 5.95)
Checking for matches between display adapter(s) and image(s)...
Adapter: GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00
WARNING: None of the firmware image compatible PCI Device ID's
match the PCI Device ID of the adapter.
Adapter PCI Device ID: 06C0
Firmware image PCI Device ID: 06D1
PCI Device ID override confirmation skipped.
Overriding GPU mismatch
Current - Version:70.00.21.00.02 ID:10DE:06C0:10DE:075F
GF100 Board - 10220000 (Normal Board)
Replace with - Version:70.00.2B.00.0E ID:10DE:06D1:10DE:075F
GF100 Board - 10300200 (Normal Board)
The display may go *BLANK* on and off for up to 10 seconds or more during the up
date process depending on your display adapter and output device.
Identifying EEPROM...
EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page
NOTE: Preserving board settings in preservation slot 6
NOTE: Preserving board settings in preservation slot 7
Clearing original firmware image...
.
Storing updated firmware image...
..
Verifying update...
Update successful.
Note that we’re using some undocumented features here (-5 -6 disables a couple of security checks for the overrides). If this doesn’t work for you, you may want to check out -h or try these undocumented options:
--eraseeeprom erases all data from the EEPROM
--debug gives a load of debug information during the upgrade
--refreshstraps refreshes the softstraps on the card
Now it’s time to change the softstraps. Let’s first do a binary comparison of the two PCI Device IDs:
GTX480 06C0 0000011011000000
C2050 06D1 0000011011010001
I read out my AND/OR masks and compare them to the masks of the C2050 firmware. Turns out AND/OR 0 are different, but AND/OR 1 are identical:
Hex: AND mask 0 OR mask 0 AND mask 1 OR mask 1 CHECKSUM!
C2050 0x6FFC03FF 0x10000400 0x7FF1FFFF 0x80020000
GTX480 0x7FFC3FFF 0x00004000 0x7FF1FFFF 0x80020000
(ignore first bit! always set to zero)
Softstraps in the firmware are applied over hardstraps on the card, where the AND and OR masks control how the hardstraps are modified. Masks should always be below or equal to 0x7FFFFFFF. The mechanism works as follows:
( ( [hardstraps] & [AND mask] ) | [OR mask] ) = final straps
In practice, the AND mask allows you to disable certain hardstraps while the OR mask allows you to enable specific straps.
So far, I’ve managed to figure out the functionality of a few of the strap bits (any additional information is welcome!).
straps 0:
-xx+xxxx xxxxxxxx xx++++xx xxxxxxxx
^ ^^^^
| ||||-pci dev id[0]
| |||--pci dev id[1]
| ||---pci dev id[2]
| |----pci dev id[3]
|---------------------pci dev id[4]
- cannot be set, always 0
So in my case, it’s just a matter of ensuring that bits 0 and 4 of the PCI Device ID in straps 0 are set to 1. I can do this by adding the appropriate bits to OR mask 0. I’ll take the original OR mask for my GTX 480 - which has a few other bits set to 1 as well for who knows what, I’ll keep these just to be sure - and enable the appropriate bits for the PCI Device ID:
OR mask 0:
GTX480 -0000000 00000000 01000000 00000000
NEW -0010000 00000000 01000100 00000000
Be sure to figure this out for your own card. If you’ve figured this out, you can use nvflash to apply the new masks:
(--straps [AND mask 0] [OR mask 0] [AND mask 1] [OR mask 1])
>nvflash --index=1 --straps 0x6FFC3BFF 0x10004400 0x7FF1FFFF 0x00020000
nvflash will directly change the softstraps on the card’s firmware. This means that if you save the firmware back to a file, you should be able to see that the softstraps at the appropriate locations in the firmware have changed (including the checksums at 00000068).
The secondary card should now contain the modified firmware + modified softstraps, but in order to see the changes, you’ll now have to reboot your computer. The next boot in Windows will likely result in the secondary card being detected as new hardware, after which Windows will attempt to install the “appropriate” drivers (never works for me). Make sure that you immediately re-install the latest NVIDIA drivers (you don’t need the special Tesla drivers) and do another reboot after the installation is complete.
Run nvflash again to verify that the PCI Device ID has indeed changed:
> nvflash -a
NVIDIA Firmware Update Utility (Version 5.95)
NVIDIA display adapters present in system:
<0> GeForce GTX 480 (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00
<1> Tesla C2050 (10DE,06D1,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00
If all went well, you should now be able to enable TCC by using the nvidia-smi tool located at C:\Program Files\NVIDIA Corporation\NVSMI. Note that with the 270.32 (CUDA 4.0rc) drivers, nvidia-smi is broken for me (known issue) and I had to grab the nvidia-smi tool from 263.06 (Tesla) in order to get things running:
> nvidia-smi -g 0
==============NVSMI LOG==============
Timestamp : 03/20/2011 12:53:07 AM
Driver Version : 270.32
GPU 0:
Product Name : Tesla C2050
PCI Device/Vendor ID : 6d110de
PCI Location ID : 0:3:0
Board Serial : 6182738065
Display : Not connected
Temperature : 46 C
Utilization
GPU : 0%
Memory : 0%
...
> nvidia-smi -g 0 -dm 1
TCC enabled for device 0
Keep in mind that you may have to reboot in order for TCC to be enabled. If everything goes smoothly, your Tesla card should not be showing up in the NVIDIA Control Panel, and the deviceQuery SDK sample should be outputting the following information:
Device 1: "Tesla C2050"
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
...
Device is using TCC driver mode: Yes
Congratulations, you’re now running your secondary card in TCC! Don’t forget to make sure that both your cards are still functioning correctly though. Don’t forget to post a response!
P.S. If you’re interested in helping out figure out the meaning of the strap bits… one way of doing this is by trial and error: set all strap bits to 0 by using the AND mask (0x00000000), then add the appropriate PCI Device ID bits through the OR mask to make sure your card is still detected properly, and try settings a single bit to 1, reboot, perform a test/query (CUDA-Z or deviceQuery) and see what changes, then start over again and continue with the next bit.
EDIT: Added instructions for working softstraps modifications. Added softstraps documentation.