I’ve been trying to run a python script that requires pytorch and opencv with gpu modules. I’ve based my image on the image that I found on the jetson-inference repo, dustynv/jetson-inference:r32.4.4 that is. After running the code I noticed that I get much less fps than I got outside the container. Turns out all opencv parts of the code are painfully slow, about 10 times slower on average for CPU. I tried running just the part of my code that uses opencv directly in the container dustynv/jetson-inference:r32.4.4 but I got approximately same results.
Information about my Jetson Nano:
L4T 32.4.4 [ JetPack 4.4.1 ]
Ubuntu 18.04.05 LTS
I am new to both Docker and Jetson Nano and I didn’t manage to find any clues on the Internet. Is there something I’m missing?
Would you mind mounting more memory/swap to the container to see if it helps?
Sometimes the limited resources will cause the slowness.
Thank you for your reply! I’ve tried mounting more memory and I still get same results as before, for both containers. I don’t have any other memory-draining processes running while testing the containers.
We would like to reproduce this issue in our environment.
Would you mind sharing a simple example to demonstrate the OpenCV function?
More, you can also try our l4t-ml container that has CUDA-based OpenCV installed.
Of course, here is what I use:
import cv2 as cv
import numpy as np
class MyClass1:
def __init__(self, face_cascade_path):
self.face_cascade = cv.cuda_CascadeClassifier.create(face_cascade_path)
def detect_face(self, frame):
gpu_frame = cv.cuda_GpuMat(frame)
faces = self.face_cascade.detectMultiScale(gpu_frame).download()
if faces is None:
faces = np.empty(0)
faces = faces[0]
# non max supression is a function written by me and it works fine, does not requrie opencv. It could be omitted
faces = non_max_suppression_fix.non_max_suppression(faces, overlapThresh=0.3)
return faces
class MyClass2:
def __init__(self, history, varThreshold, threshHigh, threshLow ):
self.fgbg = cv.bgsegm.createBackgroundSubtractorMOG(history, varThreshold)
self.threshHigh = threshHigh
self.threshLow = threshLow
def movement_detect(self, frame):
height, width, _ = frame.shape
fgmask = self.fgbg.apply(frame)
nonZero = cv.countNonZero(fgmask)
percent = nonZero / (height * width) * 100
return percent
def edge_detect(self, frame):
frameCanny = cv.Canny(frame, self.threshLow, self.threshHigh)
_, frameBin = cv.threshold(frameCanny, 100, 255, cv.THRESH_BINARY)
frameDyl = cv.dilate(frameBin, cv.getStructuringElement(cv.MORPH_ELLIPSE, (5, 5)))
return frameDyl
For haar cascade I use haarcascade_frontalface_default.xml that can be found on github.
I’ve build a container with opencv installed, but without the gpu module. There, just for the code above (without the haar cascade) I got same results as outside the container. Could it be an issue with cuda?
The above source is using cuda_GpuMat
So you will need an OpenCV with GPU support.
Would you mind testing the sample with the l4t-ml container to see if the same issue occurs?
I did and I still have the same issue.
Could you also try to reproduce this on our latest JetPack4.6 (r32.6.1)?
More, do you have some performance data of the inside/outside docker use case?
This can help us know if we can reproduce this issue locally.
