Weird error: RuntimeError: Error while calling cudnnConvolutionForward dlib/cuda/cudnn_dlibapi.cpp:1007. code: 7, reason: A call to cuDNN failed

lepha1809 · December 5, 2019, 4:06am

Hello everyone,

I’m writing a face recognition program that runs on Jetson Nano. In the first code prototype, I used directly dlib to detect the faces, then used its wrapper library called https://github.com/ageitgey/face_recognition for the recognition part. Everything goes smoothly.

But when I tried using another method, called LFFD, https://github.com/YonghaoHe/A-Light-and-Fast-Face-Detector-for-Edge-Devices/tree/master/face_detection/deploy_tensorrt for detection, and passed the bbox arguments to face_recognition.face_encodings() with the right format the function needs, an weird error occured:

Traceback (most recent call last):
  File "predict_tensorrt_video.py", line 673, in <module>
    main()
  File "predict_tensorrt_video.py", line 88, in inner
    retval = fnc(*args, **kwargs)
  File "predict_tensorrt_video.py", line 667, in main
    run_inference(args.video_in, args.video_out, candidate_id, current_time)
  File "predict_tensorrt_video.py", line 565, in run_inference
    face_encoding = get_face_encodings(frame, css_type_face_location, 0)[0]
  File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in face_encodings
    return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
  File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in <listcomp>
    return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
RuntimeError: Error while calling cudnnConvolutionForward( context(), &alpha, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &beta, descriptor(output), output.device()) in file /home/gate/dlib-19.17/dlib/cuda/cudnn_dlibapi.cpp:1007. code: 7, reason: A call to cuDNN failed
Segmentation fault (core dumped)

although the detection results are very good (I drew bboxes to frames, wrote to output video and check with eyes).

That’s what I get when doing detection on original-size video frames, when I cropped it with a preset smaller region, the error prompted is different:

Traceback (most recent call last):
  File "predict_tensorrt_video.py", line 672, in <module>
    main()
  File "predict_tensorrt_video.py", line 88, in inner
    retval = fnc(*args, **kwargs)
  File "predict_tensorrt_video.py", line 666, in main
    run_inference(args.video_in, args.video_out, candidate_id, current_time)
  File "predict_tensorrt_video.py", line 564, in run_inference
    face_encoding = get_face_encodings(frame, css_type_face_location, 0)[0]
  File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in face_encodings
    return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
  File "/home/gate/.virtualenvs/lffd/lib/python3.6/site-packages/face_recognition/api.py", line 210, in <listcomp>
    return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
RuntimeError: Error while calling cudnnConvolutionForward( context(), &alpha, descriptor(data), data.device(), (const cudnnFilterDescriptor_t)filter_handle, filters.device(), (const cudnnConvolutionDescriptor_t)conv_handle, (cudnnConvolutionFwdAlgo_t)forward_algo, forward_workspace, forward_workspace_size_in_bytes, &beta, descriptor(output), output.device()) in file /home/gate/dlib-19.17/dlib/cuda/cudnn_dlibapi.cpp:1007. code: 7, reason: A call to cuDNN failed
cudaStreamDestroy() failed. Reason: invalid device ordinal
cudaFree() failed. Reason: invalid device pointer
cudaFreeHost() failed. Reason: invalid argument
cudaStreamDestroy() failed. Reason: invalid device ordinal
cudaFree() failed. Reason: invalid device pointer
cudaFreeHost() failed. Reason: invalid argument
cudaFree() failed. Reason: invalid device pointer
Segmentation fault (core dumped)

(some “cuda” lines above “Segmentation fault (core dumped)”).

I’ve monitored the memory with jtop when the program started to run till it ended. And memory never reached to the maximum, it only consumed around 2.5GB/4GB. The GPU reached to the maximum 99% and turned back to 0% periodically.

I’ve tried to re-check my code dozens of time, here’s my code: https://paste.ofcode.org/3avUP9WVJ4HsdT96ndxD25p, built the newest dlib 19.18 and tried again, but nothing works. Anyone please help me with this …

Here’s my specs:

NVIDIA Jetson NANO/TX1
- Jetpack 4.2.1 [L4T 32.2.0]
- CUDA GPU architecture 5.3
Libraries:
- CUDA 10.0.326
- cuDNN 7.5.0.56-1+cuda10.0
- TensorRT 5.1.6.1-1+cuda10.0
- Visionworks 1.6.0.500n
- OpenCV 4.1.0 compiled CUDA: YES
Jetson Performance: inactive
dlib version used: 19.17, 19.18 (manually compiled to avoid the problem https://devtalk.nvidia.com/default/topic/1049660/jetson-nano/issues-with-dlib-library/2#reply)

lepha1809 · December 7, 2019, 4:34am

I’ve recently narrowed down the scale of this issue. The problem is the line:

import pycuda.autoinit

Having this line caused dlib’s cudnnConvolutionForward() function in cudnn_dlibapi.cpp to fail.

What I did: I wrote a script that takes an image containing an obvious face, it has the workflow as: Performs detection, then gets its face encoding and prints that out. The script contains 2 parts: Part#1 performs the stated workflow with dlib used for detection and face_recognition (which is also dlib to some extent) used for getting the face encoding; Part#2 performs the same workflow with the only difference is that LFFD is used for detection. On the first run, I commented the second part, and that ugly error still showed up when it is executing face_encodings function. Next, I uncommented the second part and commented out the first, that error still showed up at that same function. I then tried commenting each of the import statements, and it turned out that import pycuda.autoinit is the case, when I commented it and part 1 ran beautifully with no errors. Having this line caused dlib’s cudnnConvolutionForward() function to fail.

I think dlib and pycuda.autoinit have different memory-handling mechanisms that conflicts with each other, or either is having a silent bug. The easiest way is to sacrifice one of them, but I want to use both LFFD (for detection), which needs pycuda.autoinit and dlib (for recognition), which hates pycuda.autoinit, so I have to do something to “synchronize” them.

Please give me some hints on this …

SunilJB · May 12, 2020, 11:10am

Could you please let us know if you are still facing this issue?

Thanks

fatalfeel · March 9, 2022, 9:35am

this is cudnn and driver not match problem
Do this will solved:
driver: NVIDIA-Linux-x86_64-440.118.02.run (Product Type: Data Center)
cuda: cuda_10.2.89_440.33.01_linux.run (uncheck default driver install)
cuda patch 1: cuda_10.2.1_linux.run
cuda patch 2: cuda_10.2.2_linux.run
libcudnn: (using 7.6.5 will cause Error while calling cudnnConvolutionBiasActivationForward)
libcudnn8_8.0.5.39-1+cuda10.2_amd64.deb
libcudnn8-dev_8.0.5.39-1+cuda10.2_amd64.deb
libcudnn8-samples_8.0.5.39-1+cuda10.2_amd64.deb