I tested simpleStreams and what I recieve is:
[ simpleStreams ]
Device name : GeForce GTX 480
CUDA Capable SM 2.0 hardware with 15 multi-processors
scale_factor = 1.0000
array_size = 16777216
memcopy: 10.60
kernel: 0.99
non-streamed: 11.37 (11.59 expected)
4 streams: 10.84 (3.64 expected with compute capability 1.1 or later)
PASSED
Press ENTER to exit…
And question are:
a) why there is 10.84, NOT 3.64 ?
B) should I add some extra instruction during compilation ?
The improvement from 11.37 to 10.84 is not significant - I have expected much more…
Y.
PS. From device query:
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
…
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device
simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.0, CUDA Runtime Version = 3.0, NumDevs = 1, Device = GeForce GTX 480