Poor performance of CUDA GPU, using OpenCV DNN module

I play around with the OpenCV dnn module on both CPU and GPU on Jetson Nano. I measure the time of execution of super-resolution algorithms based on four different models: EDSR, ESPCN, FSRCNN, LapSRN. I use the following code:

import cv2
from time import time

sr = cv2.dnn_superres.DnnSuperResImpl_create()

path = "EDSR_x2.pb"
# path = "ESPCN_x4.pb"
# path = "FSRCNN_x4.pb"
# path = "LapSRN_x4.pb"


# Set CUDA backend and target to enable GPU inference

sr.setModel("edsr", 2)
# sr.setModel("espcn", 4)
# sr.setModel("fsrcnn", 4)
# sr.setModel("lapsrn", 4)

img = cv2.imread('butterfly.png')

start = time()
result = sr.upsample(img)
print(time() - start)

cv2.imwrite('edsr_output.png', result)
# cv2.imwrite('espcn_output.png', result)
# cv2.imwrite('fsrcnn_output.png', result)
# cv2.imwrite('lapsrn_output.png', result)

As an input image I use “butterfly” with resolution 232 px x 155 px (from this link https://miro.medium.com/max/464/1*A8yToxEh-f0_1Up8u51aHQ.png).

The output time is as follows:

  1. EDSR x2: CPU: can’t finish, GPU: can’t finish.
  2. ESPCN x4: CPU: 0.17469215393066406 s, GPU: 10.169917821884155 s
  3. FSRCNN x4: CPU: 0.12776947021484375 ,GPU: 5.2502007484436035
  4. LapSRN x4: CPU: 8.098081111907959, GPU: 6.410776138305664

For some reason it is impossible to run the EDSR model - it took too long and the program exits.
ESPCN and FSRCNN are MUCH faster on CPU. I don’t understand why it happens.
Only LapSRN is faster on GPU, but it isn’t a siginificant improvement.

Is those result normal?
Why the GPU performance is that poor with CPU (which in Nvidia Jetson Nano isn’t powerful)?


In general, you will get some acceleration when inference a model on GPU.
But it still depends on the way of implementation.

Please monitor the GPU utilization with sudo tegrastats first.
If the GPU usage is low, there are still some zoom for the optimization.

On Jetson, it’s more recommended to use TensorRT for inference instead.

Hi @AastaLLL,

When I run CPU example the average load is 100%, but when I execute GPU example the average load is about 20 % (it is hard to measure it exactly because it vary from 0 to 100%). I looks like the DNN super resolutions modules can’t use the full power of GPU.

Is is possible to use EDSR, ESPCN, FSRCNN, LapSRN models with TensorRT?


How about the GPU usage?
You can find it from tegrastats as well.

 ... GR3D_FREQ 0%@921 ...

Do you have the ONNX format of your model?
If yes, you can run it by TensorRT with the trtexec binary directly.

$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model]


Some models need to “warm up”: first inference run is slow, succeeding runs are fast(er). Did you try multiple inferences on the same model instance yet?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.