I have a Motherboard with an Intel Z68 Chip. The CPU is a Sandy Bridge 2500k with integrated GPU. Additional I use a NVIDIA 570GTX GPU.
I connect the monitor with the Sandy Bridge GPU (HDMI connector from motherboard). Then I used the SDK BandwidthTest to find out the maximum bandwidth between the GPU und the CPU. (System : Win 7 64Bit)
My Bandwidth:
CPU to GPU 4000MByte/s
GPU to CPU 4000MByte/s
I thought if I use a integrated GPU the I can increase the bandwidth to 16GByte/s because no monitor data must transfer about the PCIe bus to the GPU. Is this a driver problem and how can I fix this?
My DDR3 bandwidth is read: 18GByte/s write 18GByte/s
That is a reasonable CUDA bandwidth result. Did you use --memory=pinned for that test?
The maximum data rate for an x16 slot (used by GPUs) in PCI-Express 2.0 is 8 GB/sec, not 16 GB/sec. In practice, the highest transfer rate I’ve seen (for pinned host memory on an X58 motherboard) is a little over 6 GB/sec.
PCIe 2.0 delivers 5 GT/s, but employs a 8b/10b encoding scheme which results in a 20 percent overhead on the raw bit rate.
6.3GB/s is an excellent number.
OK, you are free to set your own standards of excellence. :) Regardless, no chipset I’ve ever heard of goes faster than what you are observing. Clearly there is some non-zero overhead above and beyond the raw bus signaling rate.
PCIe uses packetized transport, so there is overhead due to headers. I found the following website helpful in understanding the overhead caused by this: PCIe Switches and Bridges
I do not know what the commonly used packet sizes are for modern chipsets, they may be different from what is discussed in the above document. I will note that throughput of 6300 MB/sec is at the very upper limit of what I have seen across a variety of platforms.