Quadro SimpleStreams -- please help! actual time not even close to expected time

I have a Quadro 4000 and I am running the simpleStreams project from the SDK. No matter how many nreps (number of iterations) I choose, the streamed time is essentially the same as the non-streamed case. I counted on this working for a project and now I am at a loss of where to go. Here is the output I got from the application:

memcopy: 39.23
kernel: 2.24
non-streamed: 41.27 (41.47 expected)
4 streams: 39.91 (12.05 expected with compute capability 1.1 or later)

I am hoping someone else on the forums can either verify this or refute it. I am using version 3.1 of the SDK, with the 256.53 version driver, on OpenSuse 11.2.

Thanks in advance for any help.