Jetpack 6.0 (6.0+b106 and 6.0+b87 both are installed) L4T 36.3.0
TensorRT 8.6.2
NVRM version: NVIDIA UNIX Open Kernel Module for aarch64 540.3.0
Hello
actually i am building a following pipeline with one PGIE and 3 SGIE in parallel,
PGIE(face detection) → preprocess1->SGIE1(Person Detection)
|
V
->SGIE2(face recognition)
|
v
->preprocess2->SGIE3(face swap)
Now here i am getting very low performance like 2-3 FPS only when all models included and i am doing the post processing in the gie_processing_done_buf_prob function of the deepstream-app.c for the face_swap model.
I also tried to use the cv::cuda for transferring the process on GPU but instead of increasing it degraded performance to 1-2 FPS
So check for checking purposes i removed the postprocessing for the swap model and it gave me around 10-11FPS
So i want to know how can i increase the performance because without postprocessing i wont be able to get the required output.
If the face swap postprocessing is the bottleneck, you need to optimize your implementation of the postprocessing. It is algorithm related, you need to optimize by yourself.
but i am doing the bare minimum like extracting the full frame from buffer is necessary then extracting the raw output of swap model from the buffer is alos necessary to blend it on full frame and similarly extracting the Matrix from obj_meta is also necessary…so i cannot understand what can i do…and also why is that when i shifted my cv operation on cv::cuda to utilise GPU it decreased my performance instead of increasing
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks