why is Tesla C1060 working at PCIe8X instead of 16X?

Hi,

I have to ask this question again.
My motherboard is Intel S5520SC. I disabled its integrated graphics card.
I installed GeForce 9500 and tesla C1060.
using nvidia-settings, it shows
GeForce 9500 works on PCI Express 16X
But Tesla 1060 works on PCI Express 8X

Someone reminds me to double check the motherboard to make sure all 16 lanes are electronically connected. I’m sure on that. I also change the GeForce 9500 to the slot where C1060 was on, it also works fine.

I also did some search, someone said it does not matter. Then my question is why my C1060 has to behave differently from what she was supposed to? Also is it really equivalent for working on 16X and 8X?

Thank you so much.
nvidia_bug_report.log.gz (39.4 KB)

Intel S5520SC has two x16 slot

2 PCI Express* 2.0 x16 slot (x16 mechanical)

  1. if you plug two cards in two PCIe Gen2 x16, it may not reach x16.

do you consult intel engineer?

for example: my motherboard is ASUS P5Q PRO which has two PCIe 2.0 x16 slots, however

when I plug both GTX295 and Tesla C1060, then both cards become x8
  1. what’s bandwith test in your tesla C1060, that is important.

    for pageable memory, my tesla C1060 only reaches 1 ~ 2GB/s

    if your Tesla card doesn’t have high PCI bandwidth, then you may decrease

    traffic on PCI in your algorithm

Thank you. It explained something. I’m checking the specification of the motherboard (hard for me). Does the distribution of PCI lanes depend on hardware address? I mean if I exchange the location of the two gpu cards, will the number of lanes for each exchange also?

About the second point, I don’t know how to test the bandwidth of C1060? How did you get the value 1~2GB/s? Isn’t this value the same for all Tesla C1060?

Thanks again.

  1. " if I exchange the location of the two gpu cards, will the number of lanes for each exchange also". I don’t think so

  2. “How did you get the value 1~2GB/s”

    use bandwidthTest.exe in SDK, for example

my platform: winxp pro64, driver 190.38, cuda 2.3, GTX295 + Tesla C1060

for pageable memory, the result is

[codebox]C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA C

UDA SDK\C\bin\win64\Release>bandwidthTest.exe --device=0

Running on…

  device 0:GeForce GTX 295

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1142.1

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1477.5

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 86670.5

&&&& Test PASSED[/codebox]

for pinned (page-locked) memory, the result is

[codebox]C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA C

UDA SDK\C\bin\win64\Release>bandwidthTest.exe --device=0 --memory=pinned

Running on…

  device 0:GeForce GTX 295

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 1109.9

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 3193.9

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 93751.5

&&&& Test PASSED[/codebox]

the bandwidth of PCI is “device to host” and “host to device”