C2050 simplestreams performance.

Hi, I just ran the simpleStreams SDK example on my C2050 with disappointing results. I’ve compiled this with the -arch sm_20 flag set and disabled ECC.
What am I doing wrong?

Best regards, David

Device name : Tesla C2050
CUDA Capable SM 2.0 hardware with 14 multi-processors
scale_factor = 1.0000
array_size = 16777216

kernel: 1.53
non-streamed: 13.06 (1.53 expected)
4 streams: 11.49 (1.53 expected with compute capability 1.1 or later)

devQuery out.
Device 0: “Tesla C2050”
CUDA Driver Version: 3.10
CUDA Runtime Version: 3.10
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817720320 bytes
Number of multiprocessors: 14
Number of cores: 448
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.15 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)

I get similar results with my GTX 480. Seems like streams arent working as intended or maybe just a little problem in the simpleStreams code… havent looked into it but it was fine before and afaik the way you use streams didnt change with the last CUDA release or Fermis.
Btw I saw you have a runtime limit on kernel executions. Hows that? I thought Teslas arent used for rendering display?

[ simpleStreams ]

Device name : GeForce GTX 480
CUDA Capable SM 2.0 hardware with 15 multi-processors
scale_factor = 1.0000
array_size = 16777216

memcopy: 10.50
kernel: 0.99
non-streamed: 11.46 (11.50 expected)
4 streams: 10.92 (3.62 expected with compute capability 1.1 or later)

PASSED