Haar cascade with cuda xml classifier doesn't work

I have installed opencv 4.5.1 and tensorflow 2.4 in docker container.JetPack version is 4.5.1. I try to use opencv cuda classifiers from official repo [opencv/data/haarcascades_cuda at master · opencv/opencv · GitHub]
In application I use Python to fetch xml file:

self.open_eyes_detector = cv2.cuda_CascadeClassifier.create(BASE_DIR + '/models/cuda/haarcascade_eye_tree_eyeglasses.xml')

And then Multiscale detector:

gpu_gray_face = cv2.cuda_GpuMat(gray_face)
open_eyes_glasses_result = self.open_eyes_detector.detectMultiScale(gpu_gray_face).download()

But only smiles detector is working. Every other haar classifier gives error

cv2.error: OpenCV(4.5.1) /tmp/build_opencv/opencv_contrib/modules/cudaobjdetect/src/cascadeclassifier.cpp:155: error: (-217:Gpu API call) NCV Assertion Failed: cudaError_t=702, file=/tmp/build_opencv/opencv_contrib/modules/cudale gacy/src/cuda/NCVHaarObjectDetection.cu, line=1157 in function 'NCVDebugOutputHandler’

./deviceQuery test passes so I can conclude that cuda drivers are working. What can be the reason of this error?

Are you able to try other CUDA filters, such as

filter = cv::cuda::createSobelFilter(CV_8UC4, CV_8UC4, 1, 0, 3, 1, cv::BORDER_DEFAULT);


filter = cv::cuda::createGaussianFilter(CV_8UC4, CV_8UC4, cv::Size(31,31), 0, 0, cv::BORDER_DEFAULT);

Would like to know if the failure is specific to calling cascade classifier.

Those filters created successfully:

cv2.cuda.createGaussianFilter(cv2.CV_8UC4, cv2.CV_8UC4, (31, 31), 0, 0, cv2.BORDER_DEFAULT)
<cuda_Filter 0x7f962cbe90>
cv2.cuda.createSobelFilter(cv2.CV_8UC4, cv2.CV_8UC4, 1, 0, 3, 1, cv2.BORDER_DEFAULT)
<cuda_Filter 0x7f9cc06330>

Does cascade classifier work outside docker? Would like to know if it works if you call the functions on Jetson Nano directly, without docker.

The same error is on Jetson Nano, without dockers. And after error other calls to cv2 doesn`t work.

cv2.error: OpenCV(4.4.0) /tmp/build_opencv/opencv/modules/core/src/cuda/gpu_mat.cu:121: error: (-217:Gpu API call) the launch timed out and was terminated in function 'allocate'

Looks like it fails in cudaMalloc():
opencv/gpu_mat.cu at master · opencv/opencv · GitHub

Since cascade classifier downscales images for multiple times and probably memory is insufficient.

Not sure if it helps but please try light LXDE:
This shall provide more free memory. May be able to have enough memory for running cascade classifier.

Good suggestion, but I totally disabled GUI, and there are 2.5 Gb free RAM memory and 4 Gb free SWAP. When I perform classification none of them starts decrease.
Looks like this error is related to CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT
Here it says that all existing device memory allocations from this context are invalid
But how to figure out why does it take so long to process gpu frame. Can there be some more detailed logs?