Poor performance of CUDA GPU, using OpenCV DNN module

tomek3 · November 22, 2021, 12:58pm

I play around with the OpenCV dnn module on both CPU and GPU on Jetson Nano. I measure the time of execution of super-resolution algorithms based on four different models: EDSR, ESPCN, FSRCNN, LapSRN. I use the following code:

import cv2
from time import time

sr = cv2.dnn_superres.DnnSuperResImpl_create()

path = "EDSR_x2.pb"
# path = "ESPCN_x4.pb"
# path = "FSRCNN_x4.pb"
# path = "LapSRN_x4.pb"

sr.readModel(path)

# Set CUDA backend and target to enable GPU inference
sr.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
sr.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

sr.setModel("edsr", 2)
# sr.setModel("espcn", 4)
# sr.setModel("fsrcnn", 4)
# sr.setModel("lapsrn", 4)

img = cv2.imread('butterfly.png')

start = time()
result = sr.upsample(img)
print(time() - start)

cv2.imwrite('edsr_output.png', result)
# cv2.imwrite('espcn_output.png', result)
# cv2.imwrite('fsrcnn_output.png', result)
# cv2.imwrite('lapsrn_output.png', result)

As an input image I use “butterfly” with resolution 232 px x 155 px (from this link https://miro.medium.com/max/464/1*A8yToxEh-f0_1Up8u51aHQ.png).

The output time is as follows:

EDSR x2: CPU: can’t finish, GPU: can’t finish.
ESPCN x4: CPU: 0.17469215393066406 s, GPU: 10.169917821884155 s
FSRCNN x4: CPU: 0.12776947021484375 ,GPU: 5.2502007484436035
LapSRN x4: CPU: 8.098081111907959, GPU: 6.410776138305664

For some reason it is impossible to run the EDSR model - it took too long and the program exits.
ESPCN and FSRCNN are MUCH faster on CPU. I don’t understand why it happens.
Only LapSRN is faster on GPU, but it isn’t a siginificant improvement.

Is those result normal?
Why the GPU performance is that poor with CPU (which in Nvidia Jetson Nano isn’t powerful)?

AastaLLL · November 23, 2021, 2:20am

Hi,

In general, you will get some acceleration when inference a model on GPU.
But it still depends on the way of implementation.

Please monitor the GPU utilization with sudo tegrastats first.
If the GPU usage is low, there are still some zoom for the optimization.

On Jetson, it’s more recommended to use TensorRT for inference instead.
Thanks.

tomek3 · November 23, 2021, 10:23am

Hi @AastaLLL,

When I run CPU example the average load is 100%, but when I execute GPU example the average load is about 20 % (it is hard to measure it exactly because it vary from 0 to 100%). I looks like the DNN super resolutions modules can’t use the full power of GPU.

Is is possible to use EDSR, ESPCN, FSRCNN, LapSRN models with TensorRT?

AastaLLL · December 2, 2021, 8:31am

Hi,

How about the GPU usage?
You can find it from tegrastats as well.

 ... GR3D_FREQ 0%@921 ...

Do you have the ONNX format of your model?
If yes, you can run it by TensorRT with the trtexec binary directly.

$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model]

Thanks.

dkreutz · December 2, 2021, 12:14pm

Some models need to “warm up”: first inference run is slow, succeeding runs are fast(er). Did you try multiple inferences on the same model instance yet?

system · December 29, 2021, 2:56am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Little problem for enabled cuda on dnn module from opencv Jetson Nano cuda	5	2725	October 15, 2021
OpenCV DNN not making use of Jetson Nano GPU Jetson Nano	3	2764	October 15, 2021
Jetson Nano faster for object recognition with GPU Jetson Nano jetson-inference , nano2gb	5	1107	December 15, 2021
Decrease latency from Jetson-Inference model Jetson Nano cuda , jetson-inference	4	755	October 18, 2021
Keras MobileNets .h5 model inference on Jetson Nano: GPU is 10x slower than CPU Jetson Nano	3	1627	October 15, 2021
How to increase inference speed on JETSON NANO (4GB) Jetson Nano opencv , jetson-inference , deep-learning	5	2558	October 15, 2021
Speed up Jetson Nano tensorrt , opencv	2	1061	December 8, 2021
Problems on inference using GPU with OpenCV dnn and ONNX model Jetson AGX Xavier opencv , cuda , yolo , onnx	2	2301	September 30, 2022
Opencv2 inbuilt face recognition CNN model around 500 ms (poor performance) in Jetson Nano.. Help! Jetson Nano performance	9	1290	October 18, 2021
About using dnn module of opencv Jetson Nano opencv	4	766	October 18, 2021

Poor performance of CUDA GPU, using OpenCV DNN module

Related topics