Hardware for CUDA development

rjdohnert · April 11, 2014, 3:27am

Hello,

My name is Roberto Dohnert. Now before I start I want to say this is NOT spam, nor is it advertising. i work with PC/OpenSystems LLC we create a wide range of hardware for different uses powered by our Linux distribution Black Lab Linux. We were wondering. What do you guys look for when you acquire hardware for developing with CUDA. What type of specs do you look for? Is there a certain price point etc?

Roberto J. Dohnert
PC/OpenSystems LLC

Skybuck · April 12, 2014, 5:09am

Number of cuda cores, the higher the better.
Number of cuda “super cores” smx or smp or whatever it’s called.
Memory frequency
Memory bandwidth
Watt usage (tdp) (Could be expressed as cuda cores per watt, or watt per cuda core.).
How many slots it takes. Preferably just one. Leaves room for venting/air flow.
Passive cooling versus active cooling. No fan is less noise and dust issues.
And lately and most importantly: 64 bit floating point operations. 32 bit floating point operations are basically worthless. So any card that does not support 64 bit floating point operations is basically worthless for generic processing.

Concerning 8… I don’t think nvidia has any customer graphics cards yet that support 64 bit floating point. There is always some catch/snag.

Only the workstation/high-end stuff seems to have 64 bit floating point… from hear-say ;)

seibert · April 12, 2014, 12:42pm

Every NVIDIA GPU of compute capability 1.3 and later supports double precision floating point, so anything better than a GTX 260.

The difference is that the GeForce cards (except Titan) have their double precision throughput limited to somewhere between 1/4 and 1/16 the throughput of the Tesla cards. That’s a big restriction, but very different from “no support”.

seibert · April 12, 2014, 12:49pm

As for an answer to the question posed:

When I was assembling CUDA workstations for our research group, I was most interested in systems that had a single GPU, but room to add newer cards as they came out. So, a decent sized case + a power supply that could run 2 or 3 high end cards. The motherboard should also have PCI-Express switches to support the extra x16 slots at full bandwidth rather than cutting all the slots to x8 when extra cards are installed. Beyond that, I considered an SSD essential, but that’s not really CUDA specific.

Most pre-built systems (Dell, etc) do not have room for additional cards, so we always built our own systems from parts.

Skybuck · April 15, 2014, 2:50am

In theory and according to specification perhaps however practice is a different matter.

Compiler issues, Development Environment issues.

Without proper software support a commodore C64 might even offer better 64 bit floating support :)

seibert · April 16, 2014, 1:19pm

Do you have any idea what you are talking about? Double precision works fine on GeForce cards. I have used it for accumulators as part of a larger calculation. The throughput handicap means that you are not setting any double precision LINPACK speed records with your GeForce card, but if double precision is a small part of your calculation, you should use it.

Skybuck · April 18, 2014, 3:10pm

64 bit floating points did not work for my kernel on GT 520.

Exact same kernel worked fine for 32 bit floating points.

I have also seen at least one website stating that certain versions of maxwell do not support 64 bit floating point, so beware.

mfatica · April 18, 2014, 4:09pm

You need to compile for at least compute capabilities 1.3.
GT520 supports double precision computations , as every card released in the past 4-5 years.

Please, do not give wrong informations.

Skybuck · April 21, 2014, 5:51pm

Kernel was compiled with compute capability 2.0.

Works for 32 bit floating point, not for 64 bit floating point.

Perhaps array of 64 bit floating point to big to fit into gpu for some reason, though this seems weird with 1 GB of ram on GPU. It was just a picture of reasonable dimensions.

Other problem I can think of is maybe parameters passed to kernel wrong but I don’t think so.

Most likely issue with cuda compiler itself or the hardware.

Perhaps GT 520 has issues with compute 2.0 kernels.

Skybuck · April 21, 2014, 5:51pm

Kernel was compiled with compute capability 2.0.

Works for 32 bit floating point, not for 64 bit floating point.

Perhaps array of 64 bit floating point to big to fit into gpu for some reason, though this seems weird with 1 GB of ram on GPU. It was just a picture of reasonable dimensions.

Other problem I can think of is maybe parameters passed to kernel wrong but I don’t think so.

Most likely issue with cuda compiler itself or the hardware.

Perhaps GT 520 has issues with compute 2.0 kernels.

njuffa · April 21, 2014, 6:28pm

Have you considered the possibility that there is a bug in the code? In my experience that is a much more likely scenario than a compiler bug.

Not knowing the code, standard recommendations apply: make sure status returns of all API calls and kernel launches are checked, use cuda-memcheck to find out-of-bounds memory access and race conditions. Make sure the host code works correctly by using valgrind or similar tool.

Skybuck · April 23, 2014, 11:25pm

Why would there be a bug ? It works fine in 32-bit floating point. Makes no sense.

My kernel is very simple, while the compiler is super complex.

Let’s assume for a moment that compiler was buggy.

Two situations can now exist:

The bug was solved, or it’s still in the compiler.

To re-create this problem try the following:

Create a large 1D array of particles with all kinds of properties/fields. (It represents an image’s pixels which can then all move individually)

Make most fields 32 bit floating points.

Test if the code works.

Then simply flip a type and make it 64 bit floatin point (double).

If that works we’ll talk again.

tera · April 24, 2014, 1:01am

I can’t remember a single instance from the forums (other than yours) where there was even a remote doubt on the double precision arithmetic capabilities of any CUDA device.

Skybuck · April 24, 2014, 11:23am

I do use beta versions and I have seen plenty of people report bugs in the compiler.

Plus I found bugs in other compilers as well. No compiler is without it’s bugs.

Today I have some time.

So I will downloaded the latest cuda 6 release.

I’ll try to install it and then I will do a compile.

If it’s still not working I will upload the kernel to my webdrive.

And then you guys and girlies can try for yourself.

Skybuck · April 24, 2014, 12:38pm

As far as I am concerned CUDA is total crap now.

My application uses the device driver api version of CUDA.

And the application/the cuda driver won’t even load the module.

It complains of some kind of floating point error.

I will upload the video so you can see the crap in action for yourself.

And I will make my app distributeable so you guys at nvidia can test and debug it for yourself.

Perhaps my processor is not supported any more by the device driver api… perhaps it’s using some new floating point operations inside intel processors.

And I will upload my app to my web folder in a moment… I’ll just change some folders and so forth.

This problem was also present in cuda 5.5.

Links will follow in a moment and then nvidia will look like shit… and so will I but I dont care about that last part. To bad that it came to this.

Video created. I am now looking into this problem further.

It seems the problem is in the just in time compiler inside the driver. My app uses the driver api which is ofcourse much better than runtime crap. Cause driver api allows multi threading and multi language.

I will now make one last video comparing cuda toolkit compiler versions.

To see if older does or does not work.

Ok I will spare you the secondary video. The 64 bit floating point version did work, but only with cuda toolkit 4.2 compiler and only for debug code, and even that run buggy… sometimes it wouldn’t ran at all which is new behaviour.

All other compiler versions and settings failed.

It’s completely obvious that cuda 5 and 6 has turned into MAJOR CRAP.

Get kernel ptx and source and app here to try for yourself:

http://www.skybuck.org/CUDA/Cuda5And6HasTurnedIntoCrap.rar

Made this into a seperate topic…

Topic		Replies	Views
Recommended minimal hardware for trying CUDA programming under Win7 CUDA Programming and Performance	19	9280	December 9, 2017
Setting up a CUDA enabled system CUDA Programming and Performance	4	6537	January 8, 2009
Looking for a laptop to run scientific simulations in CUDA with double precision - speed is important CUDA Programming and Performance	7	1011	November 8, 2017
Recommended graphic card (OpenACC) CUDA Programming and Performance	11	1805	March 13, 2017
Fastest CUDA card on the market choosing best CUDA card for CUDA computation purpose CUDA Programming and Performance	9	9840	July 16, 2011
Student buying card for CUDA. Which one? CUDA Programming and Performance	16	14916	December 4, 2012
which graphic cards support cuda and FP64 GFlops > 400 CUDA Programming and Performance	10	6138	January 6, 2015
Newbie question on CUDA minimum hardware requirements CUDA Programming and Performance	15	8576	November 14, 2010
Double precision and CUDA CUDA Programming and Performance	9	7837	October 21, 2013
High Compute in Flight, low DRAM Bandwidth usage CUDA Programming and Performance	35	223	January 19, 2025

Hardware for CUDA development

Related topics