Hi All,
When I doing this setings:
dim3 dimThreads(1,1,1)
dim3 dimBlock(1,1,1)
DATA SIZE = 36byte.
I’m getting speed for kernel: 269msec
When settings:
dim3 dimThreads(10,10,1)
dim3 dimBlock(1,1,1)
DATA SIZE = 36*100byte.
I’m getting speed for kernel: 168.866msec
It normal?
This is normal; I wouldn’t expect to get precise timings for such a small workload in your first example.