Dear Cuda Team,
My application got a performance loss after installing the new driver so I tried to write a smaller program to reproduce the effect. Could you help me investigating the reason ?
The program basically do reduction on both gpu, then send result from gpu1 to gpu0 and reduction with its own result. I separate the result into multiple part for overlapping the computation and communication and synchronize using event and stream.
I attached the code with the post. I use 2 x S2050 with ECC off on redhat 2.6.18-128.1.14.el5
Compile: “nvcc -arch=sm_20 test.cu -o test”
on RC2 :
on RC :
Thanks in advance !
test.cu (3.55 KB)