Does anyone have any results of a shmoo test (pinned or paged) that they can attach to a post please? I’m having some really slow issues writing back using cudaMemCpy, but I suspect this has more to do with not using pinned memory. Just want to check my ULi PCI Express chipset + 8800GT are in the same ballpark as others.
Shmoo Mode
Host to Device Bandwidth for Pinned memory
.................................................................................
Transfer Size (Bytes) Bandwidth(MB/s)
67186688 5347.1
Shmoo Mode
Device to Host Bandwidth for Pinned memory
.................................................................................
Transfer Size (Bytes) Bandwidth(MB/s)
67186688 4590.3
Shmoo Mode
Device to Device Bandwidth
.................................................................................
Transfer Size (Bytes) Bandwidth(MB/s)
67186688 51735.3
&&&& Test PASSED
Here’s the important stuff from some results. I don’t want to spam the board with the whole thing unless it’s necessary. PM me if you’d like the rest, I guess. It’s an 8800 GTS in a Tyan server board. We have some standard desktop motherboards too, I’ll try to run tests on those as well. They’re slower though, I think.
Here are the pagaeable results, cropped:
Shmoo Mode
Host to Device Bandwidth for Pageable memory
.................................................................................
Transfer Size (Bytes) Bandwidth(MB/s)
67186688 1607.6
Shmoo Mode
Device to Host Bandwidth for Pageable memory
.................................................................................
Transfer Size (Bytes) Bandwidth(MB/s)
67186688 1464.6
Shmoo Mode
Device to Device Bandwidth
.................................................................................
Transfer Size (Bytes) Bandwidth(MB/s)
67186688 51714.5
&&&& Test PASSED
Here are the results on my machine (780i MB, 8800 GTS 512MB GPU, DDR 8000 RAM). Host <–> device rates are reported to be slightly faster on Intel P35 chipsets. bandwidth_test_shmoo_780i.gz (1.74 KB)
I think I’ve identified my problem here… time to look at another chipset sometime, me thinks.
For the record, its the ULi M1689 - the one with AGP and PCI-E v1 support. Shame, as this doesnt show up in gaming, though when did a game’s graphics rendering feed back much to the system?
Sorry to dig up old posts, but I was wondering if anybody knew if the most significant limiting factor for the smaller transfers as illustrated in this post with the shmoo test is the pci bus, host cpu or gpu memory interface.
Probaby the CPU to handle the overhead of set-up. Hmm, i forgot to run shmoo when I did the downclocking of my CPU and FSB. (Btw, it’s also possible to change PCI-E frequency on my board, to see what effect that has.)