Huge Linux vs XP performance boost with beta 2.0

FullyArticulate · April 30, 2008, 9:53pm

I installed CUDA 1.1 under Fedora Core 8 x86_64, and have been benchmarking my program (along with some of the other programs in the SDK) vs benchmarks on Windows with the exact same hardware.

My program:
Windows XP (32-bit): 1450.8ms
Linux (FC8, x86_64): 2554.2ms

So, how about the stock BlackScholes sdk example?

Windows XP (32-bit): 3.5ms
Linux (FC8, x86_64): 5.3ms

Linux is running at 55%-67% of Windows with the 1.1 SDK!

So, I installed 2.0 beta under Linux.

My Linux system exactly matches my Windows 1.1 results, resulting in an obviously big performance jump.

So, I installed 2.0 under Windows.

My program went from 1450.8ms to 1170.4ms under Windows. So, my Linux 2.0 benchmarks now match my Windows 1.1 benchmark. But my Windows 2.0 benchmark is now faster than my Linux 2.0 benchmark by quite a bit.

Bottom line, the 2.0 beta is definitely worth installing immediately, but the performance mismatch between Linux and Windows is puzzling.

smokescreen · May 1, 2008, 4:03pm

Firstly its a beta so its definately worth flagging this difference with NVIDIA whatever the case is. I was going to suggest the overhead of 64-bit processing, but even so, thats way too big.

Might be worth detailing what compile your using under Linux - presuming .net CL for windows?

MisterAnderson42 · May 1, 2008, 5:57pm

Our application only has a +/- 2% delta in performance differences between Windows XP 32-bit and Linux 64-bit: and that is due to MSVC poorly optimizing portions of the CPU code. And I’ve never noticed a difference in any of the microbenchmarks I’ve written.

Does your application read pointers out of global memory or otherwise perform a lot of operations with pointers? In 64-bit OSes, pointers are 64-bit in CUDA to match the host pointer size. This can potentially lead to more registers used (check the cubin), or more memory transfers, etc…

One other area where performance is often different between linux and windows is in host<->device transfers. On many systems, linux is slower than windows: usually to the tune of 2.5 GiB/s linux vs 3 GiB/s windows.

FullyArticulate · May 2, 2008, 7:45pm

Have you tried the Black Scholes benchmark across XP-32 and Linux-x86_64? My application’s performance boost mirrored that of the BS results.

Upgrading Linux to beta 2.0 of CUDA instantly gave me the performance that existed on Windows XP 1.1, so my guess is something significant has changed.

seibert · May 2, 2008, 8:02pm

Going from CUDA 1.1 to CUDA 2.0 on 64-bit Linux had no significant performance delta for me. Unfortunately I don’t have a Black-Scholes benchmark from before the upgrade to compare to.

MisterAnderson42 · May 2, 2008, 10:21pm

I’ll try it when i get back to the office.

MisterAnderson42 · May 5, 2008, 2:25pm

Hardware: 8800 GTS 512MB
Linux benchmarks are performed in the text console mode with nothing but sshd running in the background. Windows Vista benchmarks are performed with Aero disabled and all default background serviced running, except SuperFetch.

Black Scholes times
CUDA 1.1 Linux 64-bit: 2.0534 +/- 0.0006 ms
CUDA 2.0 beta Linux 64-bit: 1.6797 +/- 0.002 ms
CUDA 2.0 beta Vista32: 1.6538 +/- 0.0009 ms

So, Black Scholes does seem a little slower with CUDA 1.1, but both Linux 64-bit and Vista32 with CUDA 2.0 beta perform nearly identically. Apparently, there was some compiler improvement between 1.1 and 2.0 here.

However, Mersenne Twister offers a counter example: Windows is slower

Mersenne Twister BoxMullerGPU() samples per second
CUDA 1.1 Linux 64-bit: 5.57 +/- 0.2 billion
CUDA 2.0 beta Linux 64-bit: 5.63 +/- 0.1 billion
CUDA 2.0 beta Vista32: 4.57 +/- 0.1 billion

I think this goes to show that compiler/performance differences between architectures needs to be evaluated on a case-by-base basis. I would suggest starting by examining the register counts and occupancy numbers for cases where the performance differs from one OS to another. This is the most likely cause for differences in compute-intensive kernels.

For instance, the BlackScholes SDK example compiles to 16 regs in 64-bit linux CUDA 2.0 beta, and 16 in CUDA 1.1. OK, so my idea didn’t work here… I can’t explain the performance difference here, it will probably take wumpus and decuda to completely unravel the compiler differences in BlackScholes. But: in my own kernels I have seen register count differences between 64-bit and 32-bit compiles cause performance differences due to the changed occupancy.

Topic		Replies	Views
Linux vs. Windows XP performance Ran an arbitrary benchmark CUDA Programming and Performance	11	12774	February 18, 2008
CUDA performance on Linux Sample programs shows it's slower? CUDA Programming and Performance	9	10221	June 5, 2007
Big performace differece between Linux and Windows,is that normal? CUDA Programming and Performance	6	1613	December 19, 2019
Speed difference for same CUDA code under Windows/Linux CUDA Programming and Performance	24	46348	March 17, 2010
163x performance boost on Fedora 28 vs Windows 10? CUDA Setup and Installation	7	655	February 1, 2019
CUDA on WIN7 is much slower than on WIN XP same computer, two OSs, two different run times CUDA Programming and Performance	2	15708	November 11, 2009
Windows 7 vs Linux CUDA Programming and Performance	4	1046	August 5, 2013
Is cuda 2.0 faster than the previous versions? CUDA Programming and Performance	3	3434	July 25, 2008
CUDA v2.0 beta is slower than CUDA v1.1 Is it just temporarily ? CUDA Programming and Performance	3	2734	July 20, 2008
CUDA 2.0: impressions so far? The good and the bad of 2.0 CUDA Programming and Performance	2	7213	August 23, 2008

Huge Linux vs XP performance boost with beta 2.0

Related topics