I have trained a classification model with pytorch backend in TAO Toolkit 5.0 and generated TensorRT engine. When running inference with the engine in PyCUDA with the following code:
# Load the TRT engine
engine_file = '/home/nvidia/pycuda/FAN/classification_model_export_9.engine'
with open(engine_file, 'rb') as f, trt.Runtime(trt.Logger()) as runtime:
engine_data = f.read()
engine = runtime.deserialize_cuda_engine(engine_data)
# Create the context and allocate memory
context = engine.create_execution_context()
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = engine.get_binding_dtype(binding)
# Allocate device memory for inputs/outputs
device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
# Append to the appropriate list
if engine.binding_is_input(binding):
inputs.append(device_mem)
else:
outputs.append(device_mem)
bindings.append(int(device_mem))
# Load the label file
label_file = '/home/nvidia/pycuda/FAN/labels.txt'
with open(label_file, 'r') as f:
labels = f.read().splitlines()
print(labels)
def preprocess_image(image):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized_image = cv2.resize(image, (224, 224)).astype(np.float32)
resized_image -= np.array([103.939, 116.779, 123.68], dtype=np.float32)
resized_image = np.transpose(resized_image, (2, 0, 1))
resized_image = np.expand_dims(resized_image, axis=0)
return resized_image
def infer_image(image):
print("Inferencing started")
# Copy input image to device
cuda.memcpy_htod_async(inputs[0], image.ravel(), stream)
# Run inference
context.execute_async(bindings=bindings, stream_handle=stream.handle)
# Synchronize the stream
stream.synchronize()
# Get the output label
output = np.empty(trt.volume(engine.get_binding_shape(engine[engine.num_bindings - 1])), dtype=np.float32) # Output shape
cuda.memcpy_dtoh_async(output, outputs[0], stream)
cuda.memcpy_dtoh(output, outputs[0])
# Get the predicted label
label_id = np.argmax(output)
print(label_id)
# Return the predicted label
return labels[label_id]
def classify_image(input_image):
image = cv2.imread(input_image)
# Preprocess the image
preprocessed_image = preprocess_image(image)
# Run inference on the preprocessed image
predicted_label = infer_image(preprocessed_image)
print("Predicted = ", predicted_label)
return predicted_label
I get the following error:
[10/05/2023-19:54:40] [TRT] [E] 1: [raiiMyelinGraph.h::RAIIMyelinGraph::24] Error Code 1: Myelin (Compiled against cuBLASLt 11.11.3.0 but running against cuBLASLt 11.10.3.0.)
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:101: DeprecationWarning: Use get_tensor_shape instead.
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:101: DeprecationWarning: Use network created with NetworkDefinitionCreationFlag::EXPLICIT_BATCH flag instead.
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
[10/05/2023-19:54:40] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:102: DeprecationWarning: Use get_tensor_dtype instead.
dtype = engine.get_binding_dtype(binding)
Traceback (most recent call last):
File "/home/nvidia/pycuda/examples/infer_all_class_FAN.py", line 104, in <module>
device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
OverflowError: can't convert negative value to unsigned int
I have upgraded the TensorRT version to 8.5.3.1 to match the TensorRT engine generated from TAO Toolkit. I have also upgraded to CUDA 12.0 and CUDNN 8.9.5.29.
The code previously runs fine with CUDA 11.8 and TensorRT 8.5.5.2 with different model trained with TAO 4 so I can assume the code is okey, but there might be some version incompatibility with the exported model. The error Compiled against cuBLASLt 11.11.3.0 but running against cuBLASLt 11.10.3.0
gives an insight but I’m not sure how to check the cuBLAS version or how to change the version. Please help.