OpenCV works slower in docker container

I’ve been trying to run a python script that requires pytorch and opencv with gpu modules. I’ve based my image on the image that I found on the jetson-inference repo, dustynv/jetson-inference:r32.4.4 that is. After running the code I noticed that I get much less fps than I got outside the container. Turns out all opencv parts of the code are painfully slow, about 10 times slower on average for CPU. I tried running just the part of my code that uses opencv directly in the container dustynv/jetson-inference:r32.4.4 but I got approximately same results.
Information about my Jetson Nano:
L4T 32.4.4 [ JetPack 4.4.1 ]
Ubuntu 18.04.05 LTS
I am new to both Docker and Jetson Nano and I didn’t manage to find any clues on the Internet. Is there something I’m missing?

Hi,

Would you mind mounting more memory/swap to the container to see if it helps?
Sometimes the limited resources will cause the slowness.

Thanks.

Hello,
Thank you for your reply! I’ve tried mounting more memory and I still get same results as before, for both containers. I don’t have any other memory-draining processes running while testing the containers.

Hi,

We would like to reproduce this issue in our environment.
Would you mind sharing a simple example to demonstrate the OpenCV function?

More, you can also try our l4t-ml container that has CUDA-based OpenCV installed.
https://ngc.nvidia.com/catalog/containers/nvidia:l4t-ml

Thanks.

Hello,
Of course, here is what I use:

import cv2 as cv
import numpy as np

class MyClass1:
    def __init__(self, face_cascade_path):
        self.face_cascade = cv.cuda_CascadeClassifier.create(face_cascade_path)
        self.face_cascade.setMinNeighbors(5)
        self.face_cascade.setMinObjectSize((30,30))
        self.face_cascade.setScaleFactor(1.3)

    def detect_face(self, frame):
        gpu_frame = cv.cuda_GpuMat(frame)
        faces = self.face_cascade.detectMultiScale(gpu_frame).download()
        if faces is None:
            faces = np.empty(0)
        else:
            faces = faces[0]
        # non max supression is a function written by me and it works fine, does not requrie opencv. It could be omitted
        faces = non_max_suppression_fix.non_max_suppression(faces, overlapThresh=0.3)
        return faces

class MyClass2:
    def __init__(self, history, varThreshold, threshHigh, threshLow ):
        self.fgbg = cv.bgsegm.createBackgroundSubtractorMOG(history, varThreshold)
        self.threshHigh = threshHigh
        self.threshLow = threshLow

    def movement_detect(self, frame):
        height, width, _ = frame.shape
        fgmask = self.fgbg.apply(frame)
        nonZero = cv.countNonZero(fgmask)
        percent = nonZero / (height * width) * 100
        return percent

    def edge_detect(self, frame):
        frameCanny = cv.Canny(frame, self.threshLow, self.threshHigh)
        _, frameBin = cv.threshold(frameCanny, 100, 255, cv.THRESH_BINARY)
        frameDyl = cv.dilate(frameBin, cv.getStructuringElement(cv.MORPH_ELLIPSE, (5, 5)))
        return frameDyl

For haar cascade I use haarcascade_frontalface_default.xml that can be found on github.
I’ve build a container with opencv installed, but without the gpu module. There, just for the code above (without the haar cascade) I got same results as outside the container. Could it be an issue with cuda?

Hi,

The above source is using cuda_GpuMat.
So you will need an OpenCV with GPU support.

Would you mind testing the sample with the l4t-ml container to see if the same issue occurs?

Thanks.

I did and I still have the same issue.

Hi,

Could you also try to reproduce this on our latest JetPack4.6 (r32.6.1)?

More, do you have some performance data of the inside/outside docker use case?
This can help us know if we can reproduce this issue locally.

Thanks.