Transfer data from host to device Transfer 10G

Amir_Baruh · August 19, 2009, 12:39pm

Hi
What is the MAX bandwidth I can get with GTX 295?
Can I improve it with external device (BUS, DMA, Special memory…)?
I ask because I need to transfer 10G and to do Matrix-Vector Multiplication.
Thanks
Amir

LSChien · August 19, 2009, 12:48pm

you can use bandwidthTest.exe in SDK example to test,

in my platform: GTX 295

host to device: 1.1 GB/s

device to host: 1.7 GB/s

Pimbolie1979 · August 19, 2009, 12:56pm

You must buy a Mainboard with PCIe 2.0.

Then you have a max datatransfer from 5000MByte per second (PC to GPU) and 5000MByte per second (GPU to PC). The PCIe Bus is a bidirectional Bus, so you can read and write data at the same time.

You can buy a second GPU. So you can copy the first picture in the first GPU and the second picture in the second GPU. So you can copy 10000MByte per second.

Amir_Baruh · August 19, 2009, 1:32pm

Hi

Do you know how to get more B.W (Maybe special PICe)?

Thanks

Amir

eyalhir74 · August 19, 2009, 1:46pm

You’ll always be limited by the PCIe of the board, I guess, since even if you have a very fast external device

you need to connect it to the miserable x16 lanes you have on the motherboard.

Does your calculation take a lot of time? If so you can try to move half the data, run the kernel async, copy the

other half and than run another async kernel on the second half. That way you can save time and do things at the same time (calculate and transfer data).

I think someone here refered to Nehalem as the fastest available machine - PCI wise

eyal

Laue · August 19, 2009, 1:51pm

PCIe (2.0) is a standardized port and the graphics hardware are manufactured for this standard.

It doesn’t exists a “special” thing.

You can overclock your system. But i think you a stable and reliable system is more desired…

I ask because I need to transfer 10G and to do Matrix-Vector Multiplication.

host to device: 1.1 GB/s

10G Host->Device / 5G (result back) Device->Host

OK. it takes some time, but what’s the problem :) - you can’t calculate any problems in realtime External Media

Amir_Baruh · August 19, 2009, 1:59pm

I need to get better results than CPU.

Amir_Baruh · August 19, 2009, 2:00pm

What/Who is Nehalem?

eyalhir74 · August 19, 2009, 2:16pm

Nehalem are the new Intel chips.

Look here at my (stupid) suggestion and what tmurray said:

http://forums.nvidia.com/index.php?showtop…40&start=40

posts #44 - 47

how about the other question? how long does your kernel take?

eyal

Amir_Baruh · August 20, 2009, 8:24am

I talk about 268M and I run on Quadro FX 1700.

The tranfer to GPU takes 95.5 (ms) → 2.8GB/s.

              Kernel takes 26.6 (ms)

I want to buy GTX 295 so I believe I can get:

Transfer (5GB/s) → 53.48 (ms).

Kernel x30 0.88 (ms).

What do you think?

Amir

eyalhir74 · August 20, 2009, 8:38am

I talk about 268M and I run on Quadro FX 1700.

The tranfer to GPU takes 95.5 (ms) → 2.8GB/s.
              Kernel takes 26.6 (ms) 
I want to buy GTX 295 so I believe I can get:

Transfer (5GB/s) → 53.48 (ms).

Kernel x30 0.88 (ms).

What do you think?

Amir

Well, 5GB/s is probably optimistic but in anycase the ratio between the kernel and transfer is still very high.

Any way you can save data from being copied to the GPU? for example, I had three arrays, one was sqrt of the other

I found that it was better to calculate it on the fly than to pass it as input to the GPU.

Maybe if you can elaborate more on what is the data you move and what does your kernel do, someone will have

idea as to how to improve this ratio…

edit: BTW - did you try pinned memory?

eyal

Amir_Baruh · August 20, 2009, 2:56pm

mfatica · August 20, 2009, 3:03pm

The FX1700 is a PCI-e gen2 card.
You will not see faster PCI-e transfer using the GeForce.

PCI-e transfer speed depends on the MB/chipset.

CapJo · August 20, 2009, 9:14pm

Intel’s upcoming Lynnfield CPUs will have PCIe 2.0 integrated in the CPU itself

and might therefore achieve higher bandwidth and certainly lower latencies.

Lynfield Overview @ anandtech

Amir_Baruh · August 21, 2009, 10:34am

Do you know how much bandwidth we can get with this chip and GTX295?

avidday · August 21, 2009, 10:57am

It hasn’t been released yet. There are no performance numbers. But it will be less that 6.4 Gb/s because that is all that is practically achievable under PCI-e 2.0 with 16 lanes.

YDD · August 21, 2009, 1:19pm

Technical point on the GTX295… don’t the cards split the bus, so that each only really has x8 bandwidth? When I had a GX2, I think I found something like this was happening.

avidday · August 21, 2009, 1:45pm

The GTX295 switches the bus, so can can get close to the full available PCI-e bandwidth to one of the GPUs if the other one is idle. WIth both going, it will be reduce to slightly less than half per GPU.

YDD · August 21, 2009, 3:14pm

OK - that’s good to know. The GTX295 is like one cable from an S1070 :)

Amir_Baruh · August 23, 2009, 7:21am

Ok…

I want to buy a new machine (~2200$).

I want the GTX295.

I understood that there are some kinds of PCIex16.

Which of them do you suggest?

Thanks

Amir