Error: CUDNN_STATUS_EXECUTION_FAILED

Hi,
I was following one of the online tutorials but I was getting this error:
Traceback (most recent call last):
File “ssd_object_detection.py”, line 20, in
detections = net.forward()
cv2.error: OpenCV(4.3.0) /home/blah/opencv/modules/dnn/src/layers/…/cuda4dnn/primitives/…/csl/cudnn/convolution.hpp:461: error: (-217:Gpu API call) CUDNN_STATUS_EXECUTION_FAILED in function ‘convolve_with_bias_activation’

It a python script and this is the command:
python ssd_object_detection.py --prototxt MobileNetSSD_deploy.prototxt
–model MobileNetSSD_deploy.caffemodel
–input guitar.mp4 --output output.avi
–display 0 --use-gpu 1

This is my configuration:

  • Jetson Nano device
  • Ubuntu 18.04
  • /usr/local/cuda/bin/nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Wed_Oct_23_21:14:42_PDT_2019
    Cuda compilation tools, release 10.2, V10.2.89
  • – NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH)
    – NVIDIA GPU arch: 53
    – NVIDIA PTX archs:
    – cuDNN: YES (ver 8.0)
    – NVIDIA CUDA: YES (ver 10.2, CUFFT CUBLAS FAST_MATH)
    – NVIDIA GPU arch: 53
    – cuDNN: YES (ver 8.0)
  • opencv 4.3.0 built from source with OPENCV_DNN_CUDA=ON, CUDNN_VERSION=‘8.0’, WITH_CUDA=ON, WITH_CUDNN=ON, and many other settings enabled
  • Python 3.7.7

This is the code I am trying to run (it completes successfully if I don’t use the GPU). It fails at the line detections = net.forward()

CLASSES = [“background”, “aeroplane”]
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

net = cv2.dnn.readNetFromCaffe(args[“prototxt”], args[“model”])
print(“[INFO] setting preferable backend and target to CUDA…”)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

print(“[INFO] accessing video stream…”)
vs = cv2.VideoCapture(args[“input”] if args[“input”] else 0)
writer = None
fps = FPS().start()

while True:
(grabbed, frame) = vs.read()
frame = imutils.resize(frame, width=400)
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 0.007843, (300, 300), 127.5)
net.setInput(blob)
detections = net.forward()

for i in np.arange(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
if confidence > args[“confidence”]:
idx = int(detections[0, 0, i, 1])
box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype(“int”)

  	label = "{}: {:.2f}%".format(CLASSES[idx],
  		confidence * 100)
  	cv2.rectangle(frame, (startX, startY), (endX, endY),
  		COLORS[idx], 2)
  	y = startY - 15 if startY - 15 > 15 else startY + 15
  	cv2.putText(frame, label, (startX, y),
  		cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)

if args[“display”] > 0:
cv2.imshow(“Frame”, frame)
key = cv2.waitKey(1) & 0xFF

  if key == ord("q"):
  	break

if args[“output”] != “” and writer is None:
fourcc = cv2.VideoWriter_fourcc(*“MJPG”)
writer = cv2.VideoWriter(args[“output”], fourcc, 30,
(frame.shape[1], frame.shape[0]), True)

if writer is not None:
writer.write(frame)

fps.update()

fps.stop()
print(“[INFO] elasped time: {:.2f}”.format(fps.elapsed()))
print(“[INFO] approx. FPS: {:.2f}”.format(fps.fps()))

Thanks!

Hi,

It looks like your OpenCV inference the model with Caffe frameworks.

Suppose you have built Caffe from source on your environment first.
Please check that if you have built the library with correct architecture (sm_53) for Nano GPU.

Thanks.

I’m using opencv 4.2.0 and have two Nvidia P5000 installed. getCudaEnabledDeviceCount() returns 2.
In the case of the first GPU(setDevice(0)) :

inputNet.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
inputNet.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
runs without any errors.

In the case of the second gpu(setDevice(1)):
I get (-217:Gpu API call) CUDNN_STATUS_EXECUTION_FAILED in function ‘convolve’

1 Like

Hi blowbuz,

Here is for Jetson platform, for dGPU, please open topic at CUDA Programming and Performance - NVIDIA Developer Forums