CUDA 4 RC2 Performance Lost

Dear Cuda Team,

My application got a performance loss after installing the new driver so I tried to write a smaller program to reproduce the effect. Could you help me investigating the reason ?

The program basically do reduction on both gpu, then send result from gpu1 to gpu0 and reduction with its own result. I separate the result into multiple part for overlapping the computation and communication and synchronize using event and stream.

I attached the code with the post. I use 2 x S2050 with ECC off on redhat 2.6.18-128.1.14.el5

Compile: “nvcc -arch=sm_20 -o test”


on RC2 :

on RC :

Thanks in advance ! (3.55 KB)

Since you already prepared a repro case, it would be helpful if you could file a bug. Thanks!

Sorry, how could I file a bug ?

Couldn’t found it in

I have submitted the bug through However I couldn’t find the “Bug Status” mentioned in the email. Any idea where it is?