I am trying to measure the latency and bandwidth over CARMA board.
For every run cudamemcpy gives differnt values and varies. I have tried using both
-
cudamemcpyasync with event
for(int i=0;i< iterations;i++)
{//start event
cudaEventRecord(start, 0);cudamemcpyAsync();
//stop event
cudaEventRecord(end, 0);//event sync on stop and get timing
cudaEventSynchronize(end);
cudaEventElapsedTime(&elapsedtime, start, end);
} -
cudamemcpy with cpu timings
for(int i=0;i< iterations;i++)
{//start CPU timer
cudamemcpy();
//End CPU timer
Print the timings
}
For each iteration I get different varing time in the range of 150 micro seconds to 1000 micro seconds for sending a single element to measure laency.
Why doed this variation happen?
I understand this might not the exact forum to post CARMA queries but since this is a CUDA query would like to know if anybody has faced similar issue.