Double precision floating point (DPFP) operations are critical for scientific programming.
The Nvidia GT 750 M card on the 15" Macbook Pro retina has poor support for DPFP operations. It is difficult to run a basic CUDA case (Note : DPFP operations) on the GT 750M.
It would of great help if some forum members could suggest ways to boost it’s DPFP performance just for development purposes (i.e. check speed up, debug a test case etc.). Also are there any recommendations for a mobile CUDA development unit for scientific programming ?
Note that you get your CPU’s 120 GFlops double precision only when fully expoiting SSE2 or AVX(2) on the CPU and doing multithreading properly.
A lot of scientific problems can also be adequately solved in single precision, sometimes requiring a bit of adaptation at places where rounding errors may have the most impact (Kahan summation, etc…)
Also consider using special math libraries (sometimes called double single, or DSMath), combining two single precision floats to get precision just a bit shy of a 64 bit (double) float. These functions have been posted on the forums previously, and I believe nVidia is now also providing double single precision code through their developer download channels.
One forum member once noted he was able to get a very decent speed up by first getting a coarse solution in single precision, and then doing a few more newton-raphton iterations with the (much slower) double precision on consumer cards to get to the exact solution.