Maximum bandwidth with Intel Z68 Chip

Hallo @ all

I have a Motherboard with an Intel Z68 Chip. The CPU is a Sandy Bridge 2500k with integrated GPU. Additional I use a NVIDIA 570GTX GPU.

I connect the monitor with the Sandy Bridge GPU (HDMI connector from motherboard). Then I used the SDK BandwidthTest to find out the maximum bandwidth between the GPU und the CPU. (System : Win 7 64Bit)

My Bandwidth:

CPU to GPU 4000MByte/s
GPU to CPU 4000MByte/s

I thought if I use a integrated GPU the I can increase the bandwidth to 16GByte/s because no monitor data must transfer about the PCIe bus to the GPU. Is this a driver problem and how can I fix this?

My DDR3 bandwidth is read: 18GByte/s write 18GByte/s

That is a reasonable CUDA bandwidth result. Did you use --memory=pinned for that test?

The maximum data rate for an x16 slot (used by GPUs) in PCI-Express 2.0 is 8 GB/sec, not 16 GB/sec. In practice, the highest transfer rate I’ve seen (for pinned host memory on an X58 motherboard) is a little over 6 GB/sec.

I used the sdk Bandwidthtest.exe (64Bit version) for the test. How can I configure the test?

Here are my results

bandwidthtest --memory=pinned

results for pinned memory:

CPU to GPU 6300MByte/s
GPU to CPU 6300MByte/s

bandwidthtest --memory=pageable

results for pageable memory:

CPU to GPU 4200MByte/s
GPU to CPU 4200MByte/s

OK, those are excellent bandwidth numbers.

I think excellent bandwidth Datas are 8000GByte/s.

6300MByte /s are only 79% from the optimum. Only the GPU use the PCIe Bus.

PCIe 2.0 delivers 5 GT/s, but employs a 8b/10b encoding scheme which results in a 20 percent overhead on the raw bit rate.
6.3GB/s is an excellent number.

OK, you are free to set your own standards of excellence. :) Regardless, no chipset I’ve ever heard of goes faster than what you are observing. Clearly there is some non-zero overhead above and beyond the raw bus signaling rate.

PCIe uses packetized transport, so there is overhead due to headers. I found the following website helpful in understanding the overhead caused by this:

I do not know what the commonly used packet sizes are for modern chipsets, they may be different from what is discussed in the above document. I will note that throughput of 6300 MB/sec is at the very upper limit of what I have seen across a variety of platforms.