I am running some sample code from nvidia and added a timer to record the GPU running time. When I declared and started the timer after the part “allocate host and device memory”, the running time was only 3 msec. However, if I move the timer in front of the “allocate device and host memory” part, the running time became 1200 msec! Can anyone explain how this could happen? I am very new at CUDA and would appreciate for any help from you. Thanks!
arrayReversed.cu (3.93 KB)