OpenCV works slower in docker container

I’ve been trying to run a python script that requires pytorch and opencv with gpu modules. I’ve based my image on the image that I found on the jetson-inference repo, dustynv/jetson-inference:r32.4.4 that is. After running the code I noticed that I get much less fps than I got outside the container. Turns out all opencv parts of the code are painfully slow, about 10 times slower on average for CPU. I tried running just the part of my code that uses opencv directly in the container dustynv/jetson-inference:r32.4.4 but I got approximately same results.
Information about my Jetson Nano:
L4T 32.4.4 [ JetPack 4.4.1 ]
Ubuntu 18.04.05 LTS
I am new to both Docker and Jetson Nano and I didn’t manage to find any clues on the Internet. Is there something I’m missing?

Hi,

Would you mind mounting more memory/swap to the container to see if it helps?
Sometimes the limited resources will cause the slowness.

Thanks.

Hello,
Thank you for your reply! I’ve tried mounting more memory and I still get same results as before, for both containers. I don’t have any other memory-draining processes running while testing the containers.

Hi,

We would like to reproduce this issue in our environment.
Would you mind sharing a simple example to demonstrate the OpenCV function?

More, you can also try our l4t-ml container that has CUDA-based OpenCV installed.

Thanks.

Hello,
Of course, here is what I use:

import cv2 as cv
import numpy as np

class MyClass1:
    def __init__(self, face_cascade_path):
        self.face_cascade = cv.cuda_CascadeClassifier.create(face_cascade_path)
        self.face_cascade.setMinNeighbors(5)
        self.face_cascade.setMinObjectSize((30,30))
        self.face_cascade.setScaleFactor(1.3)

    def detect_face(self, frame):
        gpu_frame = cv.cuda_GpuMat(frame)
        faces = self.face_cascade.detectMultiScale(gpu_frame).download()
        if faces is None:
            faces = np.empty(0)
        else:
            faces = faces[0]
        # non max supression is a function written by me and it works fine, does not requrie opencv. It could be omitted
        faces = non_max_suppression_fix.non_max_suppression(faces, overlapThresh=0.3)
        return faces

class MyClass2:
    def __init__(self, history, varThreshold, threshHigh, threshLow ):
        self.fgbg = cv.bgsegm.createBackgroundSubtractorMOG(history, varThreshold)
        self.threshHigh = threshHigh
        self.threshLow = threshLow

    def movement_detect(self, frame):
        height, width, _ = frame.shape
        fgmask = self.fgbg.apply(frame)
        nonZero = cv.countNonZero(fgmask)
        percent = nonZero / (height * width) * 100
        return percent

    def edge_detect(self, frame):
        frameCanny = cv.Canny(frame, self.threshLow, self.threshHigh)
        _, frameBin = cv.threshold(frameCanny, 100, 255, cv.THRESH_BINARY)
        frameDyl = cv.dilate(frameBin, cv.getStructuringElement(cv.MORPH_ELLIPSE, (5, 5)))
        return frameDyl

For haar cascade I use haarcascade_frontalface_default.xml that can be found on github.
I’ve build a container with opencv installed, but without the gpu module. There, just for the code above (without the haar cascade) I got same results as outside the container. Could it be an issue with cuda?

Hi,

The above source is using cuda_GpuMat.
So you will need an OpenCV with GPU support.

Would you mind testing the sample with the l4t-ml container to see if the same issue occurs?

Thanks.

I did and I still have the same issue.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Could you also try to reproduce this on our latest JetPack4.6 (r32.6.1)?

More, do you have some performance data of the inside/outside docker use case?
This can help us know if we can reproduce this issue locally.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.