CuDNN error in docker

timeescapenow · March 24, 2021, 11:21am

I try to convert my TF 2 OD model into tensorrt format using TrtGraphConverter and I have an error
I run nvcr.io/nvidia/tensorrt:20.08-py3 container on my GeForce GTX 1650 (4 GB)
tensorflow==2.4.1
tensorflow-gpu==2.4.1

My code:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
import cv2

input_saved_model_dir = ‘/ML/faster_export/saved_model/’
output_saved_model_dir = ‘/ML/tensorrt_faster’

num_runs = 100

conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
print(1 << 32)
conversion_params = conversion_params._replace(max_workspace_size_bytes=(1 << 32))
conversion_params = conversion_params._replace(precision_mode=“FP16”)

conversion_params = conversion_params._replace(maximum_cached_engiens=100)

converter = trt.TrtGraphConverterV2(input_saved_model_dir=input_saved_model_dir, conversion_params=conversion_params)
converter.convert()

def my_input_fn():
for _ in range(num_runs):
inp1 = np.random.normal(size=(1, 640, 640, 3)).astype(np.uint8)
# inp1 = np.expand_dims(np.random.normal(size=(1, 320, 320, 3)).astype(np.float32), axis=0)
yield inp1,

converter.build(input_fn=my_input_fn)
converter.save(output_saved_model_dir)

The error I have:

		2021-03-23 09:14:26.821119: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
		2021-03-23 09:14:28.836255: I tensorflow/compiler/tf2tensorrt/common/utils.cc:58] Linked TensorRT version: 7.1.3
		2021-03-23 09:14:28.836434: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libnvinfer.so.7
		2021-03-23 09:14:28.836452: I tensorflow/compiler/tf2tensorrt/common/utils.cc:60] Loaded TensorRT version: 7.1.3
		2021-03-23 09:14:29.006500: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libnvinfer_plugin.so.7
		2021-03-23 09:15:34.764469: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger …/rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
		2021-03-23 09:15:37.105490: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger …/rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
		2021-03-23 09:15:37.237905: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:758] TF-TRT Warning: Engine creation for StatefulPartitionedCall/map/while/body/_410/map/while/Preprocessor/TRTEngineOp_0_4 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
		2021-03-23 09:15:37.246973: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:628] TF-TRT Warning: Engine retrieval for input shapes: [[1,640,640,3]] failed. Running native segment for StatefulPartitionedCall/map/while/body/_410/map/while/Preprocessor/TRTEngineOp_0_4
		2021-03-23 09:15:38.985931: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger …/rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
		2021-03-23 09:15:38.986137: E tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger …/rtSafe/safeContext.cpp (105) - Cudnn Error in initializeCommonContext: 4 (Could not initialize cudnn, please check cudnn installation.)
		2021-03-23 09:15:38.986684: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:758] TF-TRT Warning: Engine creation for StatefulPartitionedCall/TRTEngineOp_0_3 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
		2021-03-23 09:15:38.986714: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:628] TF-TRT Warning: Engine retrieval for input shapes: [[1,320,320,3]] failed. Running native segment for StatefulPartitionedCall/TRTEngineOp_0_3
		2021-03-23 09:15:40.287327: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
		2021-03-23 09:15:40.288272: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
		2021-03-23 09:15:40.289028: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
		2021-03-23 09:15:40.425242: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at trt_engine_op.cc:400 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
		[[{{node StatefulPartitionedCall/ssd_mobile_net_v2fpn_keras_feature_extractor/functional_1/Conv1/Conv2D}}]]
		Traceback (most recent call last):
		File model_to_tensor_rt.py, line 40, in
		converter.build(input_fn=my_input_fn)
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/compiler/tensorrt/trt_convert.py, line 1187, in build
		func(*map(ops.convert_to_tensor, inp))
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py, line 1669, in call
		return self._call_impl(args, kwargs)
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/wrap_function.py, line 247, in _call_impl
		args, kwargs, cancellation_manager)
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py, line 1687, in _call_impl
		return self._call_with_flat_signature(args, kwargs, cancellation_manager)
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py, line 1736, in _call_with_flat_signature
		return self._call_flat(args, self.captured_inputs, cancellation_manager)
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py, line 1919, in _call_flat
		ctx, args, cancellation_manager=cancellation_manager))
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py, line 560, in call
		ctx=ctx)
		File /usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py, line 60, in quick_execute
		inputs, attrs, num_outputs)
		tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
		(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
		[[{{node StatefulPartitionedCall/ssd_mobile_net_v2fpn_keras_feature_extractor/functional_1/Conv1/Conv2D}}]]
		[[StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack/_60]]
		(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
		[[{{node StatefulPartitionedCall/ssd_mobile_net_v2fpn_keras_feature_extractor/functional_1/Conv1/Conv2D}}]]
		[[StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_5/NonMaxSuppressionV5/_224]]
		0 successful operations.
		0 derived errors ignored. [Op:__inference_pruned_63910]

		Function call stack:
		pruned → pruned

Topic		Replies	Views
"ERROR: Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED" inside nvidia tensorflow docker CUDA Setup and Installation	3	1879	May 27, 2020
Got Cudnn Error in executeConv: 3 (CUDNN_STATUS_BAD_PARAM) TensorRT tensorrt , cuda , ubuntu	1	1410	February 21, 2022
Cudnn Error in execute: 8 TensorRT	5	3116	October 12, 2021
TensorRT Cudnn Error TensorRT tensorrt	1	580	January 12, 2021
Can not run two tensorrt models (two dockers) on same GPU TensorRT tensorrt , tensorflow , tf-trt	1	921	September 7, 2021
From my frozen graph. I created tensorRT graph. My frozen_graph was working fine on same system. but when I tried same code with tensorRT converted graph. got error described below.. CUDA Programming and Performance	0	331	March 13, 2020
TensorRT Error: Can't identify the cuda device. Running on device 0 TensorRT tensorrt , cuda , tensorflow	3	654	January 7, 2021
when using Tensorrt 6.0.1.5, Cudnn Error in initializeCommonContext: 4 TensorRT	7	4625	March 19, 2020
Couldn't get current device: unknown error TensorRT tensorrt , cuda , tensorflow	1	829	January 15, 2021
Cuda initialization failure when converting trt model with different GPU TensorRT tensorrt	7	6491	September 28, 2022

CuDNN error in docker

conversion_params = conversion_params._replace(maximum_cached_engiens=100)

Related topics