I am working on a Jetson Xavier AGX board which is under Ubuntu 18.04. OpenCV 4.5.1 and Cuda 10.2 are built on the board.
From a video I get frames, these frames are converted into float32 using :
Then a numpy array is used to store these float 32 frames.
We want to speed up the processing of each frame, the instruction which takes the more time, about 50%, is the conversion to float32.
To speedup this instruction I tried to use Cupy, a package which use GPU acceleration to do numpy instruction but
cupy.float32(frame) is as long as numpy’s. The purpose of this conversion is to convert these frames into Tensorflow Tensor.
I also tried to use Numba which is supposed to run loops asynchronously if i’m not mistaken. The conversion using numpy was not compatible with it.
So, how can I speedup this conversion ? Why Cupy instruction is as long as Numpy ?
I was also thinking doing this in C++ but I’m not sure it will speedup processing because python seems fast at doing operation on matrices. Does anyone knows if it will be interesting ?
Thanks for your time,
I hope there are all the information needed,