zaobz
May 22, 2007, 12:45am
1
I compared the results of the sample programs on the SDK, and it seems like CUDA is slower in Linux.
The only thing that was faster, was simpleGL, where the wave moves much faster compared to on Windows (if (faster == performance))
Any specific cause of this?
I’m currently using Debian. I know its not supported as yet, will that have any effect on performance?
I’d say that what is potentially more likely is that the motherboard that you’re using is impacting performance. How are you measuring performance?
I do not see any runtime differences between XP and Linux for the kernel time (using the CUDA profiler). Kernel startup and memory operations (pageable) tend to be a bit faster on Linux. The emulator can be dramatically faster on Linux with some kernels because of the better thread scheduling.
I am using WinXP SP2 and openSuSE 10.2 (2.6.18 kernel) respectively. CK804 board, 3GHz P4 HT
Peter
zaobz
May 24, 2007, 6:10am
4
My machine spec is:
Intel 5000X
2x Xeon 5160 3.0GHz
3GB RAM
8800GTX
OSes are Windows XP/SP2 and Debian
Notable differences are
Bandwith Test - Device to Device
Linux: 3331 MB/s
Win: 9504 MB/s
Binomial
Linux: 218.4 ms
Win: 162.6 ms
matrixMul
Linux: 162.8 ms
Win: 16 ms
MultiGPU
Linux: 797.8 ms
Win: 576 ms
Scan
Linux: .477ms .771ms .306ms
Win: .29ms .38ms .167ms
Vectorload
Linux: 160ms
Win: 24ms
Any ideas what is causing this?
I saw this on Fedora Core, Knoppix(Debian), Ubuntu… All same perf problem, at least D2D numbers were quite close to 333xMB/s .
On Fedora, I remember I saw the good D2D bandwidth at some point, but never saw it again.
It is good to try suse.
My machine spec is:
Intel 5000X
2x Xeon 5160 3.0GHz
3GB RAM
8800GTX
OSes are Windows XP/SP2 and Debian
Notable differences are
Bandwith Test - Device to Device
Linux: 3331 MB/s
Win: 9504 MB/s
Binomial
Linux: 218.4 ms
Win: 162.6 ms
matrixMul
Linux: 162.8 ms
Win: 16 ms
MultiGPU
Linux: 797.8 ms
Win: 576 ms
Scan
Linux: .477ms .771ms .306ms
Win: .29ms .38ms .167ms
Vectorload
Linux: 160ms
Win: 24ms
Any ideas what is causing this?
[snapback]200938[/snapback]
With the same motherboard and everything else, winxp is faster than linux.
I got 85xxMB/s d2d bw on windows, but 333xMB/s d2d on linux.
With the same motherboard and everything else, winxp is faster than linux.
I got 85xxMB/s d2d bw on windows, but 333xMB/s d2d on linux.
[snapback]204594[/snapback]
I got binomial 304 ms, matMul 44 ms, scan .5, .89, .24 ms, vectorload 44 ms (all linux).
But something’s funny – I get 3334.7 MB/s D2D – and this on an 8800GTS, not GTX.
Why should the number be virtually the same?
I was pursuing this 333x MB/s D2D speed for quite some time.
This makes me worried to devote the development and measurement on linux.
I expect linux performs better on compute.
Will be very appreciate if someone can explain that D2D number and other lower bench numbers.
I got binomial 304 ms, matMul 44 ms, scan .5, .89, .24 ms, vectorload 44 ms (all linux).
But something’s funny – I get 3334.7 MB/s D2D – and this on an 8800GTS, not GTX.
Why should the number be virtually the same?
[snapback]204667[/snapback]
The new version is going to be way faster.
If you look at the FAQ, these are the new numbers
Pageable Page-locked
Host - Device 1.7 GB/sec 3.1 GB/sec
Device - Host 1.7 GB/sec 3.1 GB/sec
Device - Device 70.7 GB/sec 70.7 GB/sec
I can confirm the values of mfatica on Linux (CUDA 0.9beta).
I don’t see this difference. And I actually don’t see why a device2device should depend on the host OS :blink:
Peter