CUDA performance on Linux Sample programs shows it's slower?

zaobz · May 22, 2007, 12:45am

I compared the results of the sample programs on the SDK, and it seems like CUDA is slower in Linux.

The only thing that was faster, was simpleGL, where the wave moves much faster compared to on Windows (if (faster == performance))

Any specific cause of this?
I’m currently using Debian. I know its not supported as yet, will that have any effect on performance?

netllama · May 22, 2007, 3:20am

I’d say that what is potentially more likely is that the motherboard that you’re using is impacting performance. How are you measuring performance?

prkipfer · May 22, 2007, 10:53am

I do not see any runtime differences between XP and Linux for the kernel time (using the CUDA profiler). Kernel startup and memory operations (pageable) tend to be a bit faster on Linux. The emulator can be dramatically faster on Linux with some kernels because of the better thread scheduling.

I am using WinXP SP2 and openSuSE 10.2 (2.6.18 kernel) respectively. CK804 board, 3GHz P4 HT

Peter

zaobz · May 24, 2007, 6:10am

My machine spec is:
Intel 5000X
2x Xeon 5160 3.0GHz
3GB RAM
8800GTX

OSes are Windows XP/SP2 and Debian

Notable differences are

Bandwith Test - Device to Device
Linux: 3331 MB/s
Win: 9504 MB/s

Binomial
Linux: 218.4 ms
Win: 162.6 ms

matrixMul
Linux: 162.8 ms
Win: 16 ms

MultiGPU
Linux: 797.8 ms
Win: 576 ms

Scan
Linux: .477ms .771ms .306ms
Win: .29ms .38ms .167ms

Vectorload
Linux: 160ms
Win: 24ms

Any ideas what is causing this?

cicicici · June 2, 2007, 6:20pm

I saw this on Fedora Core, Knoppix(Debian), Ubuntu… All same perf problem, at least D2D numbers were quite close to 333xMB/s .

On Fedora, I remember I saw the good D2D bandwidth at some point, but never saw it again.

It is good to try suse.

cicicici · June 2, 2007, 6:24pm

With the same motherboard and everything else, winxp is faster than linux.

I got 85xxMB/s d2d bw on windows, but 333xMB/s d2d on linux.

jshall · June 2, 2007, 9:32pm

I got binomial 304 ms, matMul 44 ms, scan .5, .89, .24 ms, vectorload 44 ms (all linux).

But something’s funny – I get 3334.7 MB/s D2D – and this on an 8800GTS, not GTX.

Why should the number be virtually the same?

cicicici · June 4, 2007, 5:13am

I was pursuing this 333x MB/s D2D speed for quite some time.

This makes me worried to devote the development and measurement on linux.

I expect linux performs better on compute.

Will be very appreciate if someone can explain that D2D number and other lower bench numbers.

mfatica · June 4, 2007, 6:02am

The new version is going to be way faster.
If you look at the FAQ, these are the new numbers

            Pageable     Page-locked

Host - Device 1.7 GB/sec 3.1 GB/sec
Device - Host 1.7 GB/sec 3.1 GB/sec
Device - Device 70.7 GB/sec 70.7 GB/sec

prkipfer · June 5, 2007, 3:24pm

I can confirm the values of mfatica on Linux (CUDA 0.9beta).

I don’t see this difference. And I actually don’t see why a device2device should depend on the host OS :blink:

Peter

Topic		Replies	Views
Linux vs. Windows XP performance Ran an arbitrary benchmark CUDA Programming and Performance	11	12657	February 18, 2008
Performance difference of CUDA in Windows and Linux CUDA Programming and Performance	11	16242	April 15, 2010
Big performace differece between Linux and Windows,is that normal? CUDA Programming and Performance	6	1407	December 19, 2019
device speed vs. host speed Why is my device program so slow? CUDA Programming and Performance	8	7896	August 16, 2007
Data transfers are slower when overlapped than when running sequentially CUDA Programming and Performance	9	1470	September 29, 2021
Is there anyone know about the performance at linux and windows? CUDA Programming and Performance	4	1001	November 2, 2012
Bandwith Device to Device - FAQ and reality why is it slower? CUDA Programming and Performance	4	4728	May 24, 2007
A few questions on CUDA performance with pictures! CUDA Programming and Performance	6	3356	January 10, 2009
D2D tranfers slow? D2D slower than reported in FAQ CUDA Programming and Performance	7	15425	June 13, 2007
Host <-> Device bandwidth slow CUDA Programming and Performance	6	4199	March 6, 2008

CUDA performance on Linux Sample programs shows it's slower?

Related topics