GTX480 to C2050 hack or unlocking TCC-mode on GeForce

ijsfz · March 18, 2011, 8:30pm

EDIT: Check out the post below for instructions!

This is a follow-up on the somewhat frustrating problems I’ve been experiencing lately with CUDA, in particular for our pretty complex CUDA + DirectX interop + Multi-GPU setup, more info on that here: http://forums.nvidia.com/index.php?showtopic=193894

To recap a little from that topic… I have a workstation with dual identical NVIDIA GeForce GTX480 graphics cards, Windows 7 x64. Our software consists of a graphics engine (DirectX 10) and a physics engine (CUDA) and some other misc. subsystems. Physics runs in its own CUDA context assigned to the secondary GTX480 card (separate CPU thread), generally at a rate of 1000Hz and higher. The primary card is used for DirectX 10 rendering, but also runs a CUDA context for DirectX<->CUDA interop in order to “bridge” vertex data from DirectX/primary onto the secondary card (and back).

Whenever the load on DirectX (primary card) gets sufficiently high and the rendering rate drops below somewhere around 10Hz, I notice that some CUDA runtime calls on the secondary card (physics) crawl to a halt, while the secondary card is basically running standalone from the rest of the application. So I disable the interop calls on the primary card and make sure there are no weird CPU<->GPU copies going on on the secondary card, but without any results. I then verify the problem by running some SDK samples on the secondary card alongside our software - with the physics completely disabled - in parallel and notice a performance drop in the execution times. I narrow the problem down to cudaMemcpy* calls taking up excessive CPU time for some kind of reason. I figure this shouldn’t be happening, but may be down to the way DirectX works somewhere deep in kernel land.

Following the CUDA 4.0rc release, I read about the Tesla TCC performance driver and various “exclusive” compute modes and I figured this may be the solution to my problem as the entire WDDM is dropped. Unfortunately… I don’t have a Tesla card at my disposal.

When you compare the Fermi cards in the consumer (GeForce) and HPC (Tesla) markets, and specifically the features that were introduced into CUDA 4.0 exclusively for the Tesla cards, it looks like NVIDIA may be pulling some kind of premium Tesla lock-in for whatever reason. This wouldn’t be the first time that cards are deliberately crippled. My GeForce GTX480 runs on the same GF100 architecture as a Tesla C2050, despite lacking ECC-memory and probably some other high grade components, but this really shouldn’t prevent me from using these features, right? Right.

So I dig up some low-level skills and naively figure that I could probably modify the firmware/softstraps to get the driver to detect my secondary card as a Tesla C2050 instead of a GTX480, use TCC-mode with high performance and live happily ever after. Turns out this assumption was true.

I got the secondary card to a state where it is now detected as a Tesla C2050, and I can use nvidia-smi to trigger TCC-mode (after which the card is no longer accessible through normal programs). I ran bandwidthTest to verify that the card was still working correctly, and noticed an immediate increase in performance:

Primary card, regular GTX480:

Device 0: GeForce GTX 480

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)        Bandwidth(MB/s)

   33554432                     3377.4

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)        Bandwidth(MB/s)

   33554432                     3534.5

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)        Bandwidth(MB/s)

   33554432                     119200.9

Secondary card, GTX480 rigged with C2050 firmware:

(Note that the performance of the secondary card used to be identical to the first.)

Device 1: Tesla C2050

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)        Bandwidth(MB/s)

   33554432                     5081.0

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)        Bandwidth(MB/s)

   33554432                     5037.8

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)        Bandwidth(MB/s)

   33554432                     119437.2

I then tried our own software and noticed there were no longer any performance issues going on with the cudaMemcpy* calls, and the physics engine ran as expected this time around! Mind you, no stability problems, BSODs or strange issues so far.

What irritates me most is that all these features are exclusively for Tesla while they obviously work on their GeForce counterparts as well. This leads me to believe that there is some deliberate crippling going on here, perhaps for commercial reasons or whatever.

I understand that these modifications are risky, highly experimental, unsupported and will probably result in a halt-and-catch-fire, but for the sake of documentation for those who are interested in the firmware modifications I’ll probably post a follow-up on this post explaining more details.

hocheung20 · March 18, 2011, 9:44pm

I would be quite interested in this as well.

Did your double precision FP performance go up as well? (You can use CUDA-Z to test)

ijsfz · March 19, 2011, 12:46am

Thanks. I’ll do a quick CUDA-Z benchmark tomorrow using the primary card as reference.

P.S. I think I also remember nvidia-smi showing 2 DMA copy engines for the C2050, not sure about that though. Will check.

seibert · March 19, 2011, 4:38pm

I seriously doubt the higher double precision rate will be available just because you are using the Windows TCC driver. The faster host<->device memory bandwidth you get with a GTX 480 using the TCC driver can be achieved in Linux without any driver hacking at all. The bottleneck is the overhead of WDDM for compute tasks, which Linux naturally avoids for all cards, both GeForce and Tesla.

The other limitations on the GTX 480 relative to the Telsa cards, like fast double precision and bidirectional DMA, are almost certainly imposed by on-card firmware and not OS drivers. (However, if you discover I’m wrong, then there will be a lot of happy GTX 400 and 500 series users…)

ijsfz · March 19, 2011, 8:09pm

Here is some additional benchmark/query information for the two cards.

CUDA-Z 0.5.95 - Primary GTX480 stock:

Core Information

----------------

	Name: GeForce GTX 480

	Compute Capability: 2.0

	Clock Rate: 1401 MHz

	Multiprocessors: 15

	Warp Size: 32

	Regs Per Block: 32768

	Threads Per Block: 1024

	Watchdog Enabled: No

	Threads Dimentions: 1024 x 1024 x 64

	Grid Dimentions: 65535 x 65535 x 65535

Memory Information

------------------

	Total Global: 1471.56 MB

	Shared Per Block: 48 KB

	Pitch: 2.09715e+06 KB

	Total Constant: 64 KB

	Texture Alignment: 512

	GPU Overlap: Yes

Performance Information

-----------------------

Memory Copy

	Host Pinned to Device: 5683.38 MB/s

	Host Pageable to Device: 3002.56 MB/s

	Device to Host Pinned: 5688.28 MB/s

	Device to Host Pageable: 3356.25 MB/s

	Device to Device: 58350.7 MB/s

GPU Core Performance

	Single-precision Float: 1.26361e+06 Mflop/s

	Double-precision Float: 168172 Mflop/s

	32-bit Integer: 671633 Miop/s

	24-bit Integer: 670834 Miop/s

CUDA-Z 0.5.95 - Secondary GTX480 rigged:

Core Information

----------------

	Name: Tesla C2050

	Compute Capability: 2.0

	Clock Rate: 1401 MHz

	Multiprocessors: 15

	Warp Size: 32

	Regs Per Block: 32768

	Threads Per Block: 1024

	Watchdog Enabled: No

	Threads Dimentions: 1024 x 1024 x 64

	Grid Dimentions: 65535 x 65535 x 65535

Memory Information

------------------

	Total Global: 1535.69 MB

	Shared Per Block: 48 KB

	Pitch: 2.09715e+06 KB

	Total Constant: 64 KB

	Texture Alignment: 512

	GPU Overlap: Yes

Performance Information

-----------------------

Memory Copy

	Host Pinned to Device: 5736.88 MB/s

	Host Pageable to Device: 4452.88 MB/s

	Device to Host Pinned: 5737.86 MB/s

	Device to Host Pageable: 5012.45 MB/s

	Device to Device: 57671.8 MB/s

GPU Core Performance

	Single-precision Float: 1.25623e+06 Mflop/s

	Double-precision Float: 168116 Mflop/s

	32-bit Integer: 670805 Miop/s

	24-bit Integer: 670030 Miop/s

deviceQuery 4.0 - Primary GTX480 stock:

Device 0: "GeForce GTX 480"

  CUDA Driver Version:                           4.0

  CUDA Runtime Version:                          4.0

  CUDA Capability Major/Minor version number:    2.0

  Total amount of global memory:                 1543045120 bytes

  (15) Multiprocessors x (32) CUDA Cores/MP:     480 CUDA Cores

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 32768

  Warp size:                                     32

  Maximum number of threads per block:           1024

  Maximum sizes of each dimension of a block:    1024 x 1024 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Clock rate:                                    1.40 GHz

  Concurrent copy and execution:                 Yes

  # of Asynchronous Copy Engines:                1

  Run time limit on kernels:                     No

  Integrated:                                    No

  Support host page-locked memory mapping:       Yes

  Compute mode:                                  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:                   Yes

  Device has ECC support enabled:                No

  Device is using TCC driver mode:               No

deviceQuery 4.0 - Secondary GTX480 rigged:

Device 1: "Tesla C2050"

  CUDA Driver Version:                           4.0

  CUDA Runtime Version:                          4.0

  CUDA Capability Major/Minor version number:    2.0

  Total amount of global memory:                 1610285056 bytes

  (15) Multiprocessors x (32) CUDA Cores/MP:     480 CUDA Cores

  Total amount of constant memory:               65536 bytes

  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 32768

  Warp size:                                     32

  Maximum number of threads per block:           1024

  Maximum sizes of each dimension of a block:    1024 x 1024 x 64

  Maximum sizes of each dimension of a grid:     65535 x 65535 x 65535

  Maximum memory pitch:                          2147483647 bytes

  Texture alignment:                             512 bytes

  Clock rate:                                    1.40 GHz

  Concurrent copy and execution:                 Yes

  # of Asynchronous Copy Engines:                2

  Run time limit on kernels:                     No

  Integrated:                                    No

  Support host page-locked memory mapping:       Yes

  Compute mode:                                  Default (multiple host threads can use this device simultaneously)

  Concurrent kernel execution:                   Yes

  Device has ECC support enabled:                No

  Device is using TCC driver mode:               Yes

The immediate performance boost I was talking about earlier obviously only affects pageable memory. Double precision rates are identical. Though, check out the additional asynchronous copy engine (bidirectional DMA?).

I may do some additional forensic research on the firmwares and see what happens.

hocheung20 · March 19, 2011, 9:49pm

As I understand it, he is modifying the firmware so that the card is detected as C2050. Since the driver doesnt seem to unlock faster double-FP I guess I would have to agree with you on that it is not in the driver :(

But I’m still interested on his technique to modify the card identification, since it may give hints on how to uncripple the rest of the features.

ijsfz · March 20, 2011, 12:10am

Alright, so here’s a quick tutorial on I modified my GTX480 firmware (PCI Expansion ROM). Please understand that this is UNSUPPORTED, UNTESTED and MAY VOID YOUR WARRANTY, so proceed AT YOUR OWN RISK. Changing your firmware (and especially softstraps) can potentially render your card useless where you may have to resort to hardware modifications.

Note that this tutorial assumes that you have a dual card setup, like me, so that you don’t lose your graphics functionality (TCC mode) and you can easily recover from a broken firmware by using the primary card.

A short rundown of my own workstation:

Operating System:	Windows 7 Professional, 64-bit

Driver version:		270.32

CPU:			Intel i7 920 @ 2.67GHz

Bus:			PCI Express x16 Gen2

Primary card:		Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0

Secondary card:		Club3D GeForce GTX 480 1536MB GDDR5 PCI E 2.0

Recommended firmware modification tools (you’re advised to check out the documentation of each of these):

[*]NVIDIA Firmware Update Utility v5.95: http://downloads.guru3d.com/NVFlash-5.95.0.1-download-2590.html

This (official) nvflash tool works under Windows and allows you to do firmware manipulation. (Has a couple of interesting undocumented features as well.)

[*]NVIDIA BIOS Editor v6.01: http://www.mvktech.net/content/view/4875/143/

This (unofficial) tool is called NiBiTor and has some basic editing functionality for NVIDIA firmwares.

[*]Your favourite hex editor (I prefer HxD)

In short, the goal of this firmware modification is to change the PCI Device ID of the card so it is detected as a Tesla series by the NVIDIA driver, enabling additional functionality that’s otherwise disabled. Specifically, I want to change my Device ID from 06C0 (GeForce GTX 480) into 06D1 (Tesla C2050). Coincidentally, I have a HP C2050 firmware (version 70.00.2B.00.0E) lying around to do some comparisons.

Let’s query the devices:

> nvflash -a

NVIDIA Firmware Update Utility (Version 5.95)

NVIDIA display adapters present in system:

<0> GeForce GTX 480      (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00

<1> GeForce GTX 480      (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

The card I’m interested in is the one hanging on bus id 3 or index 1, so let’s save the firmware to a file called firmware.rom:

> nvflash --index=1 -b firmware.rom

NVIDIA Firmware Update Utility (Version 5.95)

Adapter: GeForce GTX 480      (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

The display may go *BLANK* on and off for up to 10 seconds during access to the

EEPROM depending on your display adapter and output device.

Identifying EEPROM...

EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page

Reading adapter firmware image...

Image Size            : 62464 bytes

Version               : 70.00.21.00.02

~CRC32                : D040F75A

Subsystem ID          : 10DE-075F

Hierarchy ID          : Normal Board

Chip SKU              : 375-0

Project               : 1022-0000

CDP                   : N/A

Build Date            : 04/14/10

Modification Date     : 04/14/10

Saving of image completed.

Open up the firmware with your hex editor, as we will be changing the following:

[*]Softstraps. The firmware contains a mechanism called softstraps, explained below, which allows the firmware to override certain chip settings (hardstraps) including the PCI Device ID. Manipulation of softstraps is done by using a combination of two sets of 32-bit AND + OR masks. Nvflash has an option to change the straps, which we will be using, though we will first need to read out the original straps from the firmware:

AND mask 0 location: 00000058, little endian

OR mask 0 location: 0000005C, little endian

AND mask 1 location: 00000060, little endian

OR mask 1 location: 00000064, little endian

(Additionally, 00000068 and 0000006C contain the checksums for the softstraps.)

[*]Regular PCI Device ID, location: 0000018E, little endian

[*]For the sake of authenticity, the board ID/boot strings and firmware versions will also be modified.

Board boot string location: 00000086

Board ID string location: 00000122

Firmware version location: 00000238, little endian

I’ll start by modifying the firmware’s PCI Device ID value at 0000018E, located right after the PCI Vendor ID for NVIDIA (0x10DE) within the PCI block:

Little endian!

00000180:   91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 C0 06

(NEW)

00000180:   91 DF AA 8C 9A F2 F5 FF 50 43 49 52 DE 10 D1 06

Next up, I’ll change the board boot string at 00000086 and board ID string at 00000122 into their C2050 counterparts:

00000086:   GF100 P1022 SKU 0000 VGA BIOS

00000122:   GF100 Board - 10220000

(NEW)

00000086:   GF100 P1030 SKU 0200 VGA BIOS

00000122:   GF100 Board - 10300200

Then, the firmware version:

Little endian!

00000230:   00 00 00 00 00 00 00 00 00 21 00 70 02 00 00 00

(NEW)

00000230:   00 00 00 00 00 00 00 00 00 2B 00 70 0E 00 00 00

So far for the firmware image modifications, so be sure to save your modified firmware. The checksum of the modified firmware image still needs to be recalculated. You can do this by opening your modified firmware in NiBiTor (ignore the warnings about unknown device IDs) and save the firmware to another file. The “Integrity” icon should be green in your final firmware file. (The checksum can be found in the Adv. Info tab, if you’re interested.)

We’re now ready to flash the modified firmware onto the secondary card. I’ll be changing the softstraps later on by using nvflash separately, which makes it a lot easier. Let’s use nvflash with a couple of options to make it clear that we totally want it to override all kinds of settings we really shouldn’t be overriding and flash the modified firmware to the card. We’ll also perform a full erase of the EEPROM first just to be sure:

> nvflash --index=1 --eraseeeprom

> nvflash --index=1 --overridesub --overrideboard --auto --noconfirm -5 -6 firmware-new.rom

NVIDIA Firmware Update Utility (Version 5.95)

Checking for matches between display adapter(s) and image(s)...

Adapter: GeForce GTX 480      (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

WARNING: None of the firmware image compatible PCI Device ID's

match the PCI Device ID of the adapter.

Adapter PCI Device ID:        06C0

Firmware image PCI Device ID: 06D1

PCI Device ID override confirmation skipped.

Overriding GPU mismatch

Current      - Version:70.00.21.00.02 ID:10DE:06C0:10DE:075F

               GF100 Board - 10220000 (Normal Board)

Replace with - Version:70.00.2B.00.0E ID:10DE:06D1:10DE:075F

               GF100 Board - 10300200 (Normal Board)

The display may go *BLANK* on and off for up to 10 seconds or more during the up

date process depending on your display adapter and output device.

Identifying EEPROM...

EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page

NOTE: Preserving board settings in preservation slot 6

NOTE: Preserving board settings in preservation slot 7

Clearing original firmware image...

.

Storing updated firmware image...

..

Verifying update...

Update successful.

Note that we’re using some undocumented features here (-5 -6 disables a couple of security checks for the overrides). If this doesn’t work for you, you may want to check out -h or try these undocumented options:

--eraseeeprom   erases all data from the EEPROM

--debug         gives a load of debug information during the upgrade

--refreshstraps refreshes the softstraps on the card

Now it’s time to change the softstraps. Let’s first do a binary comparison of the two PCI Device IDs:

GTX480	06C0	0000011011000000

C2050	06D1	0000011011010001

I read out my AND/OR masks and compare them to the masks of the C2050 firmware. Turns out AND/OR 0 are different, but AND/OR 1 are identical:

Hex:      AND mask 0  OR mask 0     AND mask 1  OR mask 1     CHECKSUM!

C2050     0x6FFC03FF  0x10000400    0x7FF1FFFF  0x80020000

GTX480    0x7FFC3FFF  0x00004000    0x7FF1FFFF  0x80020000

                                                (ignore first bit! always set to zero)

Softstraps in the firmware are applied over hardstraps on the card, where the AND and OR masks control how the hardstraps are modified. Masks should always be below or equal to 0x7FFFFFFF. The mechanism works as follows:

( ( [hardstraps] & [AND mask] ) | [OR mask] ) = final straps

In practice, the AND mask allows you to disable certain hardstraps while the OR mask allows you to enable specific straps.

So far, I’ve managed to figure out the functionality of a few of the strap bits (any additional information is welcome!).

straps 0:

          -xx+xxxx xxxxxxxx xx++++xx xxxxxxxx    

             ^                ^^^^

             |                ||||-pci dev id[0]

             |                |||--pci dev id[1]

             |                ||---pci dev id[2]

             |                |----pci dev id[3]

             |---------------------pci dev id[4]

- cannot be set, always 0

So in my case, it’s just a matter of ensuring that bits 0 and 4 of the PCI Device ID in straps 0 are set to 1. I can do this by adding the appropriate bits to OR mask 0. I’ll take the original OR mask for my GTX 480 - which has a few other bits set to 1 as well for who knows what, I’ll keep these just to be sure - and enable the appropriate bits for the PCI Device ID:

OR mask 0:

GTX480	  -0000000 00000000 01000000 00000000

NEW       -0010000 00000000 01000100 00000000

Be sure to figure this out for your own card. If you’ve figured this out, you can use nvflash to apply the new masks:

(--straps [AND mask 0] [OR mask 0] [AND mask 1] [OR mask 1])

>nvflash --index=1 --straps 0x6FFC3BFF 0x10004400 0x7FF1FFFF 0x00020000

nvflash will directly change the softstraps on the card’s firmware. This means that if you save the firmware back to a file, you should be able to see that the softstraps at the appropriate locations in the firmware have changed (including the checksums at 00000068).

The secondary card should now contain the modified firmware + modified softstraps, but in order to see the changes, you’ll now have to reboot your computer. The next boot in Windows will likely result in the secondary card being detected as new hardware, after which Windows will attempt to install the “appropriate” drivers (never works for me). Make sure that you immediately re-install the latest NVIDIA drivers (you don’t need the special Tesla drivers) and do another reboot after the installation is complete.

Run nvflash again to verify that the PCI Device ID has indeed changed:

> nvflash -a

NVIDIA Firmware Update Utility (Version 5.95)

NVIDIA display adapters present in system:

<0> GeForce GTX 480      (10DE,06C0,10DE,075F) H:--:NRM B:02,PCI,D:00,F:00

<1> Tesla C2050          (10DE,06D1,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

If all went well, you should now be able to enable TCC by using the nvidia-smi tool located at C:\Program Files\NVIDIA Corporation\NVSMI. Note that with the 270.32 (CUDA 4.0rc) drivers, nvidia-smi is broken for me (known issue) and I had to grab the nvidia-smi tool from 263.06 (Tesla) in order to get things running:

> nvidia-smi -g 0

==============NVSMI LOG==============

Timestamp                       :  03/20/2011  12:53:07 AM

Driver Version                  : 270.32

GPU 0:

        Product Name            : Tesla C2050

        PCI Device/Vendor ID    : 6d110de

        PCI Location ID         : 0:3:0

        Board Serial            : 6182738065

        Display                 : Not connected

        Temperature             : 46 C

        Utilization

            GPU                 : 0%

            Memory              : 0%

...

> nvidia-smi -g 0 -dm 1

TCC enabled for device 0

Keep in mind that you may have to reboot in order for TCC to be enabled. If everything goes smoothly, your Tesla card should not be showing up in the NVIDIA Control Panel, and the deviceQuery SDK sample should be outputting the following information:

Device 1: "Tesla C2050"

  CUDA Driver Version:                           4.0

  CUDA Runtime Version:                          4.0

...

  Device is using TCC driver mode:               Yes

Congratulations, you’re now running your secondary card in TCC! Don’t forget to make sure that both your cards are still functioning correctly though. Don’t forget to post a response!

P.S. If you’re interested in helping out figure out the meaning of the strap bits… one way of doing this is by trial and error: set all strap bits to 0 by using the AND mask (0x00000000), then add the appropriate PCI Device ID bits through the OR mask to make sure your card is still detected properly, and try settings a single bit to 1, reboot, perform a test/query (CUDA-Z or deviceQuery) and see what changes, then start over again and continue with the next bit.

EDIT: Added instructions for working softstraps modifications. Added softstraps documentation.

cbuchner1 · March 20, 2011, 1:16am

Thanks! I am archiving your post because I think that the Ministry of Truth will have this redacted ASAP.

hocheung20 · March 20, 2011, 3:34am

Thank you for a well written guide!

Just wanted to point out one thing: the memory is little-endian, so your byte/word orders are a little bit confusing to figure out at first. (I’m a long time electrical engineer, so I’m comfortable with hex, although I haven’t really used any real hex editors, so my apologies if I was too presumptuous about how these things should be displayed)

Anyways, my hex below is in actual word order with the MSB on the left.

Just for your reference

GTX580 (in my sig):

AND MASK 0: 0xFFFFFFFF
OR MASK 0: 0x00000000

AND MASK 1: 0x7FFFFFFF
OR MASK 1: 0x80000000

Quadro 6000:

AND MASK 0: 0x7FFC3FFF
OR MASK 0: 0x00004000

AND MASK 1: 0x7FF0FFFF
OR MASK 1: 0x80030000

I’ll be working on converting my GTX580 to a Quadro 6000 tommorow.

Edit: Fix endianness

hocheung20 · March 20, 2011, 3:58am

OFFSET HEX BINARY

00000058: FF 3F FC 7F 11111111 00111111 11111100 01111111 <— GTX480 (ID 06C0) AND mask 0

00000058: FF 03 FC 6F 11111111 00000011 11111100 01101111 <— C2050 (ID 06D1) AND mask 0
                                 <b>||||</b>               |
(NEW) 3B 6F 00111011 01101111
                               <b> ^   ^  </b>             ^

I’m confused by this step, you seem to be ANDing the two AND masks together, but the middle 3 digits should turn out 0 no?

ijsfz · March 20, 2011, 12:41pm

No, in my case I just took the GTX 480 AND mask and put the bits to 0 that would later be OR’ed to 1 (through the OR mask). As far as I can tell, the firmware takes whatever the strap values are, applies the AND mask, and then applies the OR mask, so for case 0:

(((hard straps 0) & AND mask 0) | OR mask 0)

You seem to want to go from a GTX 580 (1080) to a Quadro 6000 (06D8 or 06DC). I’m not sure how far the soft straps will allow to you go here.

ijsfz · March 21, 2011, 2:11pm

Unfortunately I’m having a bit of trouble looking for an explanation as to why the values I used before are working. I’ve tried a couple of combination of masks, but they all result in the device being detected as 06C0 (GTX 480).

The combinations I’ve tried for AND mask 0:

11111111 00111011 11111100 01101111 works

11111111 11111111 11111100 01101111 fails

11111111 00110011 11111100 01101111 fails

11111111 00100011 11111100 01101111 fails

11111111 00011011 11111100 01101111 fails

11111111 00000011 11111100 01101111 fails

11111111 00000000 11111100 01101111 fails

I’m suspecting that there are two 32-bit values located at 0x68 and 0x6C that serve as some kind of checksum for the softstraps. These seem to change proportionally with the softstraps in different firmwares.

EDIT: You might as well save yourself the trouble and just use nvflash’s straps option to change the straps properly, instead of editing the straps in the firmware. You can still read out the strap values from your firmware though and probably use that as a guideline to do the modifications. As pointed out before, they’re all little endian.

Note: --straps (AND mask 0) (OR mask 0) (AND mask 1) (OR mask 1), and all masks should be below 0x7FFFFFFF

>nvflash --index=1 --straps 0x6FFC3BFF 0x10004400 0x7FF1FFFF 0x00020000

NVIDIA Firmware Update Utility (Version 5.95)

Adapter: GeForce GTX 480      (10DE,06C0,10DE,075F) H:--:NRM B:03,PCI,D:00,F:00

The display may go *BLANK* on and off for up to 10 seconds during access to the

EEPROM depending on your display adapter and output device.

Identifying EEPROM...

EEPROM ID (C2,2011) : MX MX25L1005 2.7-3.6V 1024Kx1S, page

Reading adapter firmware image...

Erasing EEPROM...

.

Storing updated firmware image...

Verifying update...

Update successful.

To verify, you can then download the firmware with nvflash (-b) and the straps should’ve been changed properly.

ijsfz · March 22, 2011, 7:20pm

Alright, I managed to find a proper way to flash the softstraps by using nvflash and found out where the relevant PCI Device ID bits are encoded in the softstraps. Be sure to take a look at the updated guide!

hocheung20 · March 22, 2011, 9:42pm

I wasn’t having much luck changing my device ID manually through the firmware. Maybe it was the checksum issue you were talking about. I also noticed the strap flashing option on nvflash. I will give it a go tonight.

Any ideas on what strap 1 does?

ijsfz · March 22, 2011, 11:40pm

I’ll try and find out somewhere tomorrow. If you do try, don’t forget to clear the MSB of the masks so they stay below 0x7FFFFFFF.

hamster143 · March 23, 2011, 12:18am

You have to be careful with --straps. If you set the wrong value, you can be badly screwed. I tried to change straps to ‘fix’ my faulty 470 that was self-identifying as Quadro 4000 for some reason (PCI ID 0x06DD instead of 0x06CD), and now the card does not even show up in the list of PCI devices. According to the docs, it seems that the only option left for me (other than RMA’ing the card) is to find and ground the STRAP_SUB_VENDOR pin. Except it’s not clearly marked on the card and I have no idea where it is.

hocheung20 · March 23, 2011, 1:30am

Interestingly enough, my default GTX580 AND MASK 0 is 0xFFFFFFFF and the OR MASK is 0x00000000

ijsfz · March 23, 2011, 12:19pm

I assume the card doesn’t show up in nvflash… that’s pretty bad. I never managed to get my card to that state, but modifying the straps can be very risky indeed. I’ve seen datasheets with straps that allowed toggling of the AGP/PCI bus, so stuff like that could get in the way.

hamster143 · March 23, 2011, 9:02pm

Not just in nvflash, but also in the device manager and in the list of devices during boot-up.

ijsfz · March 24, 2011, 1:38pm

Check out this link: Just another SysAdmin Blog! ¯\_(ツ)_/¯: ○ Dead Nvidia 8600M GT GPU VBIOS Flash Needed?
( If you have access to an EEPROM programmer: Just another SysAdmin Blog! ¯\_(ツ)_/¯: ○ Programming a MXM graphics module bios (by performing a hard flash). )

You don’t necessarily need the STRAP_SUB_VENDOR pin. You won’t be able to find this anyway since the datasheets for the NV chips are confidential.

However, I think you should theoretically be able to bypass the EEPROM chip (and thus the corrupted straps) e.g. by tweaking the Chip Enable signal on boot, depending on the model of the chip. The datasheet should be freely available. Mind you that these are small form factor ICs, so you’ll have to be very cautious when doing this. You’ll first have to identify the EEPROM chip on the board (or check your nvflash logs, if you have any). Might want to post that here so we can help you out.

(Or take the easy way out and just RMA the card.)

Topic		Replies	Views
Problems with CUDA drivers for NVIDIA Hardware CUDA Setup and Installation	9	1370	October 27, 2020
Pascal Titan X's GPU's falling off the bus Linux	0	923	December 29, 2016
2 Tesla C1060s with a legacy GeForce FX 5200 card Need help editing the xorg.conf file for multiple CUDA Programming and Performance	28	35816	January 29, 2009
Driver Installation for Tesla K80 - Problems CUDA Setup and Installation	17	7141	January 18, 2020
Win7 x64 with 2x C2050 Install problems CUDA Programming and Performance	16	40740	November 18, 2010
Anyone has a GTX 460? PCI dump needed!! CUDA Programming and Performance	4	8001	May 4, 2011
G210, GT220 deviceQuery? CUDA Programming and Performance	30	15128	November 21, 2009
Tesla Compute Cluster driver released non-display driver for 64-bit Windows Server 08/08 R2 CUDA Programming and Performance	37	30910	October 21, 2014
newbie questions CUDA Programming and Performance	14	2016	September 24, 2010
4 x GTX-295: CUDA only sees 5 x GPU (NOT the usual issues!) CUDA Programming and Performance	257	327518	February 2, 2012

GTX480 to C2050 hack or unlocking TCC-mode on GeForce

Related topics