Thanks for a quick reply.
Could you share which version/source you are comparing?
I can’t quite understand, I don’t compare between different versions, hardwares or even BSPs. I just create a for-loop and execute same codes many times( 1000 for example), including functions like vpiSubmitGaussianFilter and vpiStreamSync, recording process time of every iteration and among these elapsed time, there are unexpected larger values(such like 71945us, about 15 times of average time - 5000us), just as my question says.
All test codes are from official samples, like /opt/nvidia/vpi2/samples/05-benchmark and /opt/nvidia/vpi2/samples/09-tnr, only added some timing codes output to console, and following code snippet may make it clear(mainly from 05-benchmark):
// create images,
std::vector<float> timingsMS;
for (int batch = 0; batch < 1000/*BATCH_COUNT*/; ++batch)
{
// Record stream queue when we start processing
CHECK_STATUS(vpiEventRecord(evStart, stream));
CHECK_STATUS(vpiSubmitGaussianFilter(stream, backend, image, blurred, 5, 5, 1, 1,VPI_BORDER_ZERO));
// Record stream queue just after blurring
CHECK_STATUS(vpiEventRecord(evStop, stream));
// Wait until the batch processing is done
CHECK_STATUS(vpiEventSync(evStop));
float elapsedMS;
CHECK_STATUS(vpiEventElapsedTimeMillis(evStart, evStop, &elapsedMS));
//timingsMS.push_back(elapsedMS / AVERAGING_COUNT);
printf("elapsed %f ms \n", elapsedMS);
}
Please let me know, if you need more information, thanks.