Hi,
I use the new VPI 1.1 with python 3.6 on a B01 nano with JetPack 32.6.1.
My problem is, that all VPI is much slower compared to OpenCV on the CPU (preinstalled version).
At the moment I use the rescale alog 3x in my program, grayscale image 4k downscale.
If I use OpenCV on CPU I got 0.7 - 2.1ms duration time, with VPI CUDA 16 - 23ms and VPI CPU 8 - 14ms.
So the current OpenCV CPU version is much faster then VPI, even VPI CPU is faster then VPI CUDA.
I have an overall framerate for my system and this corresponds to the speed differences of the used algos.
The vpi.clear_cache() I do in my main program one time each frame, no difference if I do it after each vpi use.
I tried your clocks.sh and jetson_clocks, no difference in speed (I use external 6A psu with full speed mode).
If I use VPI I can see a drop of CPU usage but the VPI commands are way to slow.
In the VPI part, the timing includes buffer wrapping and scaling.
But there is only scaling function measured in the OpenCV case (since the input is already cvmat).
This will compare the scaling performance fairly.
It’s expected that copy buffer from cpu->gpu and move it back induces some overhead.
But usually you can get some performance gain via GPU acceleration.
The trade-off depends on the computing complexity applied to the buffer.
You can find some information in the below document:
I tried this and the pure calculation time is much faster, thanks for the hint.
With CUDA now 1-5ms range.
But why is the type converting so slow? This is something you have to do always, especially when you mix a lot of image calculations between numpy, OpenCV and VPI.
Is there a trick to do this faster?
Hi, tried it but you have forgotten the conversion back with .cpu in your code, otherwise I cannot use it in OpenCv and/or numpy for further things.
OpenCV takes 0.001 s
NVMEDIA_ARRAY: 53, Version 2.1
NVMEDIA_VPI : 172, Version 2.4
VPI takes 0.009 s
So it is factor 10 slower then OpenCV on CPU for rescaling here.
Not really useable in an bigger system with a lot of image processing with OpenCV and numpy with this few possible operations of VPI. :-(
Need to rethink my complete code and program structure to maybe have ab advantage of VPI, as it is today with mixed operations on a lot of positions in the code I cannot change some operations to VPI to speedup.