x58 Chipset PCIE Bandwidth Any improvement?

Well I’m installing Linux now, but with only 2GB of RAM :( Still waiting on my giant stock of 4GB DDR3 DIMMs…

Seeing 4.5 HtoD/4.8 DtoH triple channel, 3.6/3.8 dual channel…

So yeah, it pretty much scales with memory speed. 4.5/4.5 at DDR3 1600 with 6.4 GT QPI, 4.1/3.3 at DDR3 1066 with 4.8 GB QPI. Faster QPI provides a mild boost, but nothing huge–the vast majority is due to memory speed. For reference, my Harpertown with 16GB of memory runs at 1.4/1.1. Thanks Intel :D

I know this is a bit off subject, but where did you find 3GB DDR3 DIMMs? The only “4GB modules” I’ve seen are 2x2GB dual channel kits.

Hmm, this the first real application where I’ve seen memory speed really matter. I’m glad I got DDR3-1600 8-8-8-24 DIMMs.

http://www.crucial.com/store/partspecs.asp…3KIT51272BB1067

They are really expensive…

I had the same when I was looking for the pinned text and could only find pageable ;)

What memory is it specifically?

does that 4.5 mean 4.5GB/s?

I also use X58 with triple channels

the CUDA-z told me

HtoA 57XX.X MB/s (pinned) and 47XX.X~50XX.X(pageable)

so i think 4.5 is pageable

yeah, 4.5 GB/s pageable. I’m going to play with it further once I have the ridiculous amount of memory I ordered, plus I’m very curious how this will behave on a multi-Nehalem system in the future.

This is a dual channel kit for $148 from tigerdirect. So my 18 GB/s performance is running in dual channel mode, not triple, so the number only get better. You can put another matched pair in the Asus P6T and get triple channel performance for 8GB total for $300. The triple channel kits are the same sticks, just binned and packaged in tuples.

PDC34G1333LLK Patriot Extreme Performance DDR3 4GB (2 x 2GB) PC3-10666 Low Latency DIMM Kit

Link, http://www.patriotmem.com/products/detailp…=650&type=1

Here are my results with pinned memory:

  device 0:GeForce GTX 280

Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5764.9

Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5651.9

Are you sure those will work? They are registered DIMMs.

Just curious, what will be the specs on the X58 system you’re building?

Putting 8GB of memory on a Core i7 will give you dual channel at best. You have 3 channels, and you need the same a mount of memory in each channel. I also wouldn’t use those sticks, since they are rated at 1.7V. Intel recommends a maximum of 1.65V, or your CPU may be permanently damaged. It those sticks work for you, that’s great, but be careful abut the voltage.

Fairly sure they will (I’ve heard of some people using them with X58). The X58 machine is a 3.2 GHz processor with up to 24GB of DDR3 and GPUs depending on the hour of the day.

lol, I’ve never seen tmurray like this

5GB/s pageable is very surprising. It is also surprising that non-pageable doesn’t improve the numbers by much. I can’t figure how it can just be a matter of DDR3 speed. How do non-pageable transfers work, btw? Does the driver lock the pages in the OS first, or does the GPU make some odd trip through the pagetables for every page transferred?

What I once read is that it goes something like this:

(some) pagelocked memory is allocated

memory transfer from the pageable mem to the pagelocked mem.

GPU does DMA transfer of the pagelocked mem.

(I think only a certain amount of pagelocked memory is allocated, so the above scenario may be repeated a few times)

So, when you have 18 GB/s of mem bandwidth, the memory transfer in the above scenario is not a limiting factor anymore, the limiting factor is starting to be the PCI-E bandwidth as can be seen by the fact that pagelocked memory does not gain you a lot anymore.

Yes, the memory spec states you need 1.7v to run 7-7-7-20, but mine run great 1440 @ 1.64v with 8-7-7-24 timings. I did also have to bump up the QPI to 1.25v. The new 904 bios from Asus fixed a lot of memory issues and it worked for me.

Not every pair will perform the same, these have been rock solid for two weeks running server 2008 x64. The week prior was nothing but crashes and overheats, no boots, OS re-installs. I even had my GTX 280 beeping at me every 2 minutes - after a quick RMA that was fixed.

OC’ing and tweaking Nehalem is not for the weak of heart, nor thin of wallet…

hey, when I get a 3x improvement in a common PCIe transfer case I get excited :P

and what E.D. said is correct. so, it’s not surprising that it’s better, just surprising that it’s improved this much.

I was hoping that some Dominator DDR3-1600 DIMMs should do the job for my rig… and they do. They are a bit expensive, but worth it. I figured by the time I’ll need a memory upgrade, non-overclocked 4GB DDR3-1600 DIMMs will be well settled and affordable, so I’ll be able to up my RAM to 18GB without a problem. Core i7 gives you so many options… but that’s a smart overclock you got out of your DIMMs.

Excited is an understatement.

When I was testing some CUDA kernels against the CPU, I found a 8x speedup (compared to 100x, on my previous CPU). Want more numbers? On the same 4 drives, RAID5 performance jumped from 100MB/s to 200+ MB/s. On the same SLI configuration, Crysis jumped from an average of 15fps to being almost completely smooth.

When you get these numbers, if you don’t injure your head while jumping out of the chair, you go out for a smoke, come back, don’t believe it and go to bed. The next day, when you wake up and get the same results, you come to accept they are real.

Core i7 is fast, and I’m sure most of us are blown away by it. (Probably Core i7 is as revolutionary as the 8800)

BTW, nice rig you’re building. I wish I had something like that… ah, maybe in time, I’ll get to upgrade.

Correct me if I’m wrong, but the way I understand it is that the GPU can access page-locked memory directly via DMA (Direct Memory Access), without having to take a trip trhough the CPU. On the Core2, this saved a trip from the northbridge to the CPU and back. So a memory read would look like this: Mem->NB->GPU

Paged memory reads are done through the CPU, so it takes this path (On Core2/Pentium4) Mem → NB → CPU → NB ->GPU.

On the Core i7, Athlon64, and Phenom you still take a trip through the CPU, as the memory controller is integrated on the CPU, so a memory read would go Mem → CPU → NB → GPU in both cases. This explains in part the fast paged memory bandwidth.

BTW, does anyone have paged vs non-paged results for an Athlon64 / Phenom?

I have a pair of 8800 GTX cards in a Phenom X4 2.2 GHz, and a GTX 280 + GT200 prototype in a Phenom X4 2.6 GHz.

8800 GTX (PCI-e 1.0)

H->D: 2318 MB/sec (pageable), 2713 MB/sec (pinned)

D->H: 1830 MB/sec (pageable), 2975 MB/sec (pinned)

GTX 280 (PCI-e 2.0)

H->D: 2354 MB/sec (pageable), 5196 MB/sec (pinned)

D->H: 1943 MB/sec (pageable), 6092 MB/sec (pinned)

Update: I should point out that both motherboards use the AMD 790FX chipset.