Do the 9400M and 9600M GT support double precision?

Do the graphics cards in the new line of macbook pros support double precision?

No

Is there any hope to get it done by software? I don’t know how flexible those alu’s are, but no double-precision cuda means sending my macbook pro back to apple.

I understand your concern, I have the same about other properties that I would like to have on my MacBook Pro, such as doubling the register number, Atomic shared memory operations, …

Anyway you have to know that none nVidia’s mobile GPU actualy support double-precision, and even the GTX 260M IS NOT a GT200 GPU, but instead an old renamed GeForce 9800M renamed (which was itself a die-reduction of GeForce 8800!!!), so there’s no possibility to have double precision computing on Mobile GPU at this time :-(

You may write a library that will “simulate” or effectively do double-precision computing when on sm12-- device, and use direct double arithmetic when on a --arch=sm13 architecture, but it will be painfully slow to execute!!!

You can emulate higher floating point precision in software, though it will not be IEEE double precision. The traditional approach is to use functions from the dsfun90 library, which is a standard FORTRAN library which performs “psuedo-double” precision arithmetic with pairs of single precision variables. The dsfun90 functions effectively glue together two single precision floats into a quantity with 48 bits of mantissa (rather than the 53 of true IEEE double precision). The performance is 10 or 20x slower than single precision (probably even worse for transcendental functions), so you might find that a mobile Core 2 Duo CPU is easily faster than a 9600M for double precision.

If you want double precision to avoid round-off error in an accumulator (for example, a sum in an integral), I’ve had great success with Kahan summation. This trick is only 4 times slower than normal single precision, but avoids much of the explosion of error that happens when you add floating point numbers of different magnitudes together. It works well when you want a single precision value at the end, which usually requires more precision in the intermediate sum variables.

In the code exemple from nvidia for matlab cuda, the question on double is solved:

/* Check if the input array is single or double precision */
category = mxGetClassID(prhs[i]);

		if( category == mxSINGLE_CLASS)  
		{ 
			/* The input array is single precision, it canbe sent directly to the 
			 card */ 
			cudaMemcpy( data1f_gpu, data1, sizeof(float)*m*n,     
					   cudaMemcpyHostToDevice); 
			
		} 
		
		if( category == mxDOUBLE_CLASS)  
		{ 
			/* The input array is in double precision, it needs to be converted t 
			 floats before being sent to the card */ 
			data1f = (float *) mxMalloc(sizeof(float)*m*n); 
			for (j = 0; j < m*n; j++) 
			{ 
				data1f[j] = (float) data1[j]; 
			} 
			cudaMemcpy( data1f_gpu, data1f, sizeof(float)*n*m, cudaMemcpyHostToDevice); 
		} 

(http://developer.download.nvidia.com/compute/cuda/1_0/Accelerating%20Matlab%20with%20CUDA.pdf)

Doh!

I’m just porting a fft application which use doubles… I guess I’ll go for floats…

Does this mean that the cufftDoubleComplex and cufftDoubleReal (defined in cufft.h, 2.3) are there for… courtesy?

d

To use the double precision transforms, you need GPUs with compute capabilities 1.3.

The original question was if the GPUs in the MacbookPro were capable of running double precision code and the answer to that question is no.