Please provide the following information when requesting support.
• Hardware (RTX 2070)
• Network Type (Classification Pytorch)
• TLT Version (5.0.0)
I have trained a classification model with pytorch backend in TAO Toolkit 5.0 and generated TensorRT engine. When running inference with the engine in PyCUDA with the following code:
# Load the TRT engine
engine_file = '/home/nvidia/pycuda/FAN/classification_model_export_9.engine'
with open(engine_file, 'rb') as f, trt.Runtime(trt.Logger()) as runtime:
engine_data = f.read()
engine = runtime.deserialize_cuda_engine(engine_data)
# Create the context and allocate memory
context = engine.create_execution_context()
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()
for binding in engine:
size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
dtype = engine.get_binding_dtype(binding)
# Allocate device memory for inputs/outputs
device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
# Append to the appropriate list
if engine.binding_is_input(binding):
inputs.append(device_mem)
else:
outputs.append(device_mem)
bindings.append(int(device_mem))
# Load the label file
label_file = '/home/nvidia/pycuda/FAN/labels.txt'
with open(label_file, 'r') as f:
labels = f.read().splitlines()
print(labels)
def preprocess_image(image):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized_image = cv2.resize(image, (224, 224)).astype(np.float32)
resized_image -= np.array([103.939, 116.779, 123.68], dtype=np.float32)
resized_image = np.transpose(resized_image, (2, 0, 1))
resized_image = np.expand_dims(resized_image, axis=0)
return resized_image
def infer_image(image):
print("Inferencing started")
# Copy input image to device
cuda.memcpy_htod_async(inputs[0], image.ravel(), stream)
# Run inference
context.execute_async(bindings=bindings, stream_handle=stream.handle)
# Synchronize the stream
stream.synchronize()
# Get the output label
output = np.empty(trt.volume(engine.get_binding_shape(engine[engine.num_bindings - 1])), dtype=np.float32) # Output shape
cuda.memcpy_dtoh_async(output, outputs[0], stream)
cuda.memcpy_dtoh(output, outputs[0])
# Get the predicted label
label_id = np.argmax(output)
print(label_id)
# Return the predicted label
return labels[label_id]
def classify_image(input_image):
image = cv2.imread(input_image)
# Preprocess the image
preprocessed_image = preprocess_image(image)
# Run inference on the preprocessed image
predicted_label = infer_image(preprocessed_image)
print("Predicted = ", predicted_label)
return predicted_label
I get the following error:
cuInit
cuDeviceGetCount
cuDeviceGet
cuCtxCreate
cuCtxGetDevice
[10/07/2023-05:51:35] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Loading with <tensorrt.tensorrt.IExecutionContext object at 0x7f95fbd365f0>
cuStreamCreate
Size of the binding is -150528
Binding is (-1, 3, 224, 224)
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:111: DeprecationWarning: Use get_tensor_dtype instead.
dtype = engine.get_binding_dtype(binding)
Size of the binding is {size} with {binding} and {engine.max_batch_size}
Traceback (most recent call last):
File "/home/sigmind/pycuda/examples/infer_all_class_FAN.py", line 114, in <module>
device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
OverflowError: can't convert negative value to unsigned int
cuCtxPopCurrent
cuCtxPushCurrent
cuStreamDestroy
cuCtxPopCurrent
cuCtxPushCurrent
cuCtxDetach
I have upgraded the TensorRT version to 8.5.3.1 to match the TensorRT engine generated from TAO Toolkit. I have also upgraded to CUDA 12.0 and CUDNN 8.9.5.29.
The code previously runs fine with CUDA 11.8 and TensorRT 8.5.5.2 with different model trained with TAO 4 so I can assume the code is okey, but there might be some version incompatibility with the exported model. I have also tried to get the engine details which says
tensor_dtype = engine.get_binding_dtype(binding_idx)
Binding Name: input_1
Tensor Shape: (-1, 3, 224, 224)
Data Type: DataType.FLOAT
Size (bytes): -602112
Binding Name: probs
Tensor Shape: (-1, 24)
Data Type: DataType.FLOAT
Size (bytes): -96
I am confused why the model is not being loaded in my program. Also I have tried to deploy the classification pytorch model directly to deepstream as SGIE classifier with the following spec file
[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
model-color-format=1
infer-dims=3;224;224
network-type=1
num-detected-classes=24
uff-input-blob-name=input_1
maintain-aspect-ratio=0
output-tensor-meta=0
onnx-file=classification_model_export_urstp_9.onnx
labelfile-path=labels.txt
#int8-calib-file=mawa_pruned_int8_cache.bin
model-engine-file=classification_model_export_urstp_9.onnx_b2_gpu0_fp32.engine
batch-size=2
is-classifier=1
process-mode=2 ##1 Primary 2 Secondary
network-mode=0 ## 0=FP32, 1=INT8, 2=FP16 mode
interval=0
gie-unique-id=6
operate-on-class-ids=0;1;2;3;5;7
classifier-threshold=0.51
#classifier-async-mode=1
It loads the onnx model and convert to engine file but classification doesn’t work. In the original classification pytorch notebook, I have tested the engine and it works well.
Is there any compatibility issue with the Pytorch model trained with TAO-5? Any idea of solution?