Hey,
I’m working on a code where I execute vpiSubmitGaussianFilter function six times in different VPIStream streams. This goes for all the images I’m processing. During this process I record the time it takes to run this function by making use of VPIEvent type. After observing these timings I see that sometimes random stream operations take x10 slower than usual. Keep in mind that I’m processing similar images each time (same resolution, same bit depth, same type). Why does this bottleneck happen?
I’m working on Jetson AGX Orin Dev. Kit.
In order to give you a general understanding of the code I’m working on I’m giving a code piece below:
VPIEvent ev_start;
VPIEvent ev_stop;
CHECK_STATUS(vpiEventCreate(0, &ev_start));
CHECK_STATUS(vpiEventCreate(0, &ev_stop));
for (int i = 0; i < 6; ++i) {
CHECK_STATUS(vpiEventRecord(ev_start, streams[i]));
CHECK_STATUS(vpiSubmitGaussianFilter(streams[i], VPI_BACKEND_CUDA, vpiRaw, vpiBlurred[i], kernelLengths[i/2], kernelLengths[i/2], sigmas[i], sigmas[i], VPI_BORDER_ZERO));
CHECK_STATUS(vpiEventRecord(ev_stop, streams[i]));
CHECK_STATUS(vpiEventSync(ev_stop));
CHECK_STATUS(vpiEventElapsedTimeMillis(ev_start, ev_stop, &elapsedMS));
}
For example usually the timing results are like:
elapsedMS: 0.09 ms
elapsedMS: 0.12 ms
elapsedMS: 0.12 ms
elapsedMS: 0.12 ms
elapsedMS: 0.13 ms
elapsedMS: 0.11 ms
But sometimes this timing yields results like this:
elapsedMS: 1.27 ms
elapsedMS: 0.12 ms
elapsedMS: 0.12 ms
elapsedMS: 0.11 ms
elapsedMS: 0.14 ms
elapsedMS: 0.12 ms
or
elapsedMS: 0.15 ms
elapsedMS: 0.12 ms
elapsedMS: 0.11 ms
elapsedMS: 0.12 ms
elapsedMS: 0.11 ms
elapsedMS: 1.16 ms