GPU crunching for science can nVidia please iron out the drivers?

From what I’ve learned from the Folding@Home site, they are trying to get the 8800 to run the protein folding code, but a bug in the nVidia drivers prohibits this. nVidia is aware of the problem, apparently. It would be great if nVidia fixed this because there are many 8800s that could contribute to the science of protein folding which can have breakthroughs applicable to “diseases” such as Alzheimer’s to HIV.

If nVidia were to fix this bug and/or aid protein research scientists in developing their crunching code, this would only help the demand for nVidia graphics cards and it would be a public relations boon. It would also be a boon for nVidia as businesses would be more tempted to create farms of 8800s to crunch data. There are people who have bought ATI x1k cards for the sole purpose of crunching scientific data.

The GPU folding at Folding@Home is able to get 90 gigaflops of processing power from a single top-of-the-line ATI card. This compares to about 3.3 to 4 gigaflops for a Core2Duo CPU. I’m fairly certain that an 8800 could achieve a similar performance. The 8800’s shader units are not as complex as ATIs, but it has more shader units. If the 8800 DOES get similar performance, I would definitely buy one to double as a GPU cruncher and game player because the 8800 is superior in games.

(This is directed more at a bug in the drivers than CUDA, but it applies to GPU crunching)

I agree completely with paydirt, as I have a x1900XT from ATI “folding” as we call it, and that is all I use it for. I can agree that people, including myself would buy a nVidia card also, if I were able to us it for Folding@Home.

For the record, NVIDIA thinks it is a bug in brook (the “language” used to write the Folding@Home client), but Mike Houston thinks he has found something related to dot3 inside a loop that isn’t being handled correctly on the NVIDIA card for some reason…

In any case, the code will run for a little while before NaNs show up and the performance is 65 gflops compared to 95 on ATI(for a very similar code to F@H). Given that correct numbers are not produced, no attempt to tune the code to get better performance on the NVIDIA hardware has been made. It is also unlikely that an attempt to port the code to CUDA will be made in the near future. (Note that this probably wouldn’t improve the performance much, because the code is not memory bound, so the shared memory wouldn’t really add too much value.)

The total number of active GPU clients in the F@H program is probably not enough (<1000) for NVIDIA to drop everything and make sure it works.

I should point out that the 8800 has higher memory bandwidth than the 1900XTX which means it will outperform it in the vast majority of applications (which tend to be bandwidth bound.) And CUDA is a great new tool for harnessing the GPU, especially for people without graphics experience.

Thanks for the info. I’m not suggesting that nVidia drop everything or tie up major resources. If just a single person at nVidia could help Mike get this resolved, that would be great. I have no clue how complex the code for either Folding@Home or running code on a graphics card is, so I’m not saying it would be easy. Maybe there is another way the loop structure could be written?

I think this type of research (protein structure analysis) is critical, and I am dedicating a significant portion of my income toward it. As the “kinks” and bugs are worked out, I will be encouraging anyone with computing know-how to get involved.

Even if F@H isn’t able to get the 8800 code up to 90 gigaflops, anything above 30 or so gigaflops is nothing to sneeze at, really. There might be a low number of clients currently, and if nVidia were to help out and throw their hat into the ring, I’m certain that the snowball would start to build and pick up mass. I think people are waiting to see what all graphics cards will be utilized by various science projects and if PS3 will be a good cruncher.

Folding@Home has about 635 GPUs producing 37 teraflops of computing power. F@H also has 186,000 CPU clients produing 190 teraflops of computing power. It is my opinion that it won’t take long for people to take notice of this difference between CPU and GPU crunching.

I have good reasons not to buy ATI cards (based on where I work), and good reasons to only purchase Nvidia cards (I know people who work there). I would never think of buying an ATI card. And on top of this, the 8800 GTS looks like a relative bargain compared to the ATI cards. But I just upgraded to a faster ATI card. And I’m seriously considering buying another one so I can run two of them. All because the ATI cards will fold.

I’m sure people like me are a tiny segment of the market, but we’re fairly visible and we influence a lot of early adopters and gamers via other forums. Go take a look at the most popular OC forurms - every one of them has a folding team.

I can’t say that supporting F@H is going to directly impact the bottom line for Nvidia. But in the eyes of the super geeks who help set the trends, it’s going to help. And if it’s positioned correctly (are you marketing guys reading this?) and works well enough you could have us moving over within a quarter or two. I know things are tough right now, but I’d try to find a way to free up some resources to nail this one.

I don’t know what the issue really us keeping Nvidia from working on F@H, but I’m sure it’s tough nut to crack and/or you guys are heavily resource constrained, otherwise it would be fixed. So here’s my thinking: what’s good for ATI is good for AMD. What’s good for AMD is bad for Intel. Therefore, what’s good for Nvidia is good for Intel. So why don’t you guys go down the street and ask your buddies at Intel to throw some resources at this. They obviously have hardware guys. They also have a a lot of software guys (graphics, compilers, drivers, and other stuff). I suspect Intel would love to help you guys solve this thing and get Nvidia on top and more desirable for every segment of their market.

For the time being, my 7800GTX is working just fine for the games I play, so I don’t plan to upgrade cards for that reason any time soon. That being said, if I was to find out that a version of the 88xx cards was able to run the F@H GPU client, I expect I would order a card within a week. I enjoy distributed computing and the ability to crunch F@H workunits more quickly would be a strong enough reason to upgrade for me.

It would b e great if some support was able to be provided in regards to this issue.

I have been waiting to purchase an 8800 like other folders from NVIDIA, if the drivers and client can be an efficient folder.

A majority of folders I know prefer NVIDIA over ATI. Some of them had bought ATI X19xx cards, and give up folding GPU Clients. because of a lot of problems getting the right drivers to start crucnhing GPU WU.

Actually, we never thought it was a bug in brook – the question was briefly raised, but it didn’t take long to verify that this was not the case. We have been cooperating with Mike Houston in tracking down the bug; the application is complex and required someone familiar with the application code to narrow the problem down to something that was easier to isolate. Mike has done that and I have filed a bug. Unfortunately, our driver engineers are busy with many other application issues to solve and features to implement. They will get to this issue in due course.

I don’t think it’s valid to report performance for code that isn’t running correctly.

I too would like to see F@H on NVIDIA GPUs, but our engineering managers have to set priorities appropriately based on many factors competing for their team members’ time.

We’re currently focused on CUDA for GPGPU applications, because the architecture and programming model (especially the on-chip shared memory and thread synchronization) have clear and proven benefits to performance compared to pure “streaming” approaches (aka GPGPU via OpenGL or Direct3D) for many parallel computations.


Thanks for clarifying that. I should say that I’m no longer involved directly with the F@H project, so my information about that issue was second hand. I didn’t intend to give incorrect information.

Mark is right in that it is impossible to say how much the driver bug is affecting the performance. I’m pretty confident that once it does produce correct results and effort is made to tune it for the 8800, it will be at least as fast as the ATI card.

And I should emphasize that those number were NOT for the F@H code. They are for something that is related to the force calculation portion of the application.

That’s what I was trying to say at the end of my previous post, but Mark said it much better.

It seems that those managers have given all the priority to Vista drivers. I am disappointed that we still don’t have stable drivers for 8800 under Windows XP. They haven’t been updated since January 10th.

If I understand this Dwight Diercks interview correctly, there should be one new driver out each month, hopefully not just for Vista?

Sorry for being a bit offtopic, but I really need to know whether NVIDIA is aware of the following problems?

  1. Windows XP drivers for CUDA are kind of borked.

You get the old control panel after installation and you can only use it once before it disappears. After that you can’t access either old or new control panel interface. I am sure it is something trivial messed up in the setup but it is annoying since you cannot set any options so you can use those drivers only for CUDA and nothing else.

  1. Latest 97.92 drivers for XP produce some weird results with CUDA.

So in essence you can chose CUDA or gaming, you can’t have both with the same drivers which is sort of retarded.

In regards to your:

  1. I’ve been running CUDA with 97.73 drivers for a while now and can still successfully access the control panel. If you have observed specific bugs, you should file their detailed descriptions with NVidia.

  2. CUDA notes explicitly state that the 0.8 release has not been tested and should not be used with drivers other than 97.73.

I am glad that it is working for you. However, that doesn’t solve my issue.

Specific bug is that after install I have classic control panel and after first reboot I have the new one which is inaccessible – clicking on “Start the NVIDIA Control Panel” does absolutely nothing.

I know, but it is not very convenient to have to switch drivers back and forth.

I’m really happy that nVidia and Stanford are working together on this issue. While I bought my 8800 primarily for video xcoding/editing/rendering and some minor gaming, the fact of the matter is that that accounts for something less than two hours a day on average. Like most people, I have a day job and other things to do when not working, so my GPU is idle almost all of the time (I think we can all accept that web and email use negligible GPU cycles).

While we all appreciate the extensive work that both major GPU companies do to enhance our immersive gaming experiences and the fact that current GPUs are now able to take the load off of the CPU for video editing/rendering and HD playback, I would really like to see some effort to use my GPU for something productive during the 22-23 hours/day when I don’t really need it.


If you really want to help humanity, maybe you should just switch off your computer when you’re not using it?


My carbon footprint is low here. About 75% of Japan’s electrical energy is produced by nuclear power plants. I often walk to and from work. Besides, I really dig the shapes those little protiens make when they fold. B)

We’re doing number crunching for science just fine here, I have no idea what this bug is that folding@home encounters but can’t the programmers work around that in some way?

Because we need to be able to use vendor neutral code. I’m sure if we were using CUDA we may have an easier time, but at the moment at least it would also lock users into driver revisions that don’t play nice with some DX/GL games as well as create a vendor specific solution. Now, if the lower levels of CUDA were more open, we could in theory create a CUDA backend for Brook and not have to rewrite the mounds of code already there…

Our code is written using Brook and we use the DX9 backend currently. The code runs correctly on ATI hardware, as well as the CPU backend for Brook, but does not on Nvidia hardware. It’s unclear what the bug is and a fair amount of time has been spent trying to track down the bug on our side and the code is extremely complex to get much assistance from Nvidia. Once we get the new science cores up and running on ATI hardware, we can take a look again at what is going on with G80. Our shaders are fairly complex as is the interaction with the rest of the system, i.e. surfaces/textures/readback/download/queries, so debugging is difficult. We are extremely heavy on register usage, and this seems to be a problem at times with the Nvidia DX compilers.

"Mike"is it?

I would just like to say thank you for your efforts so far put into this project.

It is obvious from what you are saying is that Nvidia needs to help you understand the nature of the bug is before it can be fixed.

I hope Nvidia soon realize that a “simple” thing like folding will be a benchmark on their aspirations to GPGPU computability. Currently ATI do not have said platform but once they do, who will the scientific community go to…those that have proved that they will work with the community and solve issues, or those that don’t.

Nvidia must learn that, while it may be a small thing to them, it’s much bigger outside and only time will tell who will win out. Currently ATI are winning in this field, full stop.

If they can’t get it working on the G80 graphics platform, is there hope for it working on Tesla? If Tesla doesn’t fold, is there really any hope for Tesla.

Everyone knows Tesla is due soon, has anyone in Nvidia considered that ATI may just burst their bubble and announce they own GPGPU based on the 2900 at the same time?

I can just imagine the “DAMMIT” (AMD-ATI) marketing slogan:

GPGPUs get them here - we have proven we work, we fold! Our competition can’t even do that…after a year of trying. Who you gonna trust, “DAMMIT”…

This will be a simple piece of scientific capability over economics, and the former will win if the simplest thing can’t be proven. Darwin’s law of natural selection.

Nvidia, we all love you, but is anyone really going to purchase something that hasn’t been proven to work?

Once again Mike thanks for the efforts we all hope to see Nvidia folding systems soon.

regards to all

I would like to see nVidia giving in a bit and helping out Standford and get these bugs worked out so all the people with G80s can help out big if Folding@Home and not have to switch to ATI. I recently started with F@H and would love to be able to put out huge numbers then with just my Core2duo. I know I would be much more keen on going SLI with my 8800GTS and help out even more.

ATI has worked with Standford and produced excellent results, and the fact that nvidia can’t do the same is disheartening, to say the least.

Please nVidia do something to make this all work.

Maybe some folding people can download wumpus’s decuda and figure out actually whose bug it is?
It’s already possible to write a custom CUDA backend now.