TAO exported Classification Pytorch model not woking :: engine binding size negative

Please provide the following information when requesting support.

• Hardware (RTX 2070)
• Network Type (Classification Pytorch)
• TLT Version (5.0.0)

I have trained a classification model with pytorch backend in TAO Toolkit 5.0 and generated TensorRT engine. When running inference with the engine in PyCUDA with the following code:

# Load the TRT engine
engine_file = '/home/nvidia/pycuda/FAN/classification_model_export_9.engine'
with open(engine_file, 'rb') as f, trt.Runtime(trt.Logger()) as runtime:
    engine_data = f.read()
    engine = runtime.deserialize_cuda_engine(engine_data)

# Create the context and allocate memory
context = engine.create_execution_context()
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

for binding in engine:
    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
    dtype = engine.get_binding_dtype(binding)
    # Allocate device memory for inputs/outputs
    device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
    # Append to the appropriate list
    if engine.binding_is_input(binding):
        inputs.append(device_mem)
    else:
        outputs.append(device_mem)
    bindings.append(int(device_mem))

# Load the label file
label_file = '/home/nvidia/pycuda/FAN/labels.txt'
with open(label_file, 'r') as f:
    labels = f.read().splitlines()

print(labels)


def preprocess_image(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    resized_image = cv2.resize(image, (224, 224)).astype(np.float32)
    resized_image -= np.array([103.939, 116.779, 123.68], dtype=np.float32)
    resized_image = np.transpose(resized_image, (2, 0, 1))
    resized_image = np.expand_dims(resized_image, axis=0)
    return resized_image


def infer_image(image):
    print("Inferencing started")
    # Copy input image to device
    cuda.memcpy_htod_async(inputs[0], image.ravel(), stream)

    # Run inference
    context.execute_async(bindings=bindings, stream_handle=stream.handle)

    # Synchronize the stream
    stream.synchronize()

    # Get the output label
    output = np.empty(trt.volume(engine.get_binding_shape(engine[engine.num_bindings - 1])), dtype=np.float32)  # Output shape
    cuda.memcpy_dtoh_async(output, outputs[0], stream)
    cuda.memcpy_dtoh(output, outputs[0])

    # Get the predicted label
    label_id = np.argmax(output)
    print(label_id)

    # Return the predicted label
    return labels[label_id]



def classify_image(input_image):

    image = cv2.imread(input_image)
    
    # Preprocess the image
    preprocessed_image = preprocess_image(image)
   
    # Run inference on the preprocessed image
    predicted_label = infer_image(preprocessed_image)
    print("Predicted = ", predicted_label)
    
    return predicted_label

I get the following error:

cuInit
cuDeviceGetCount
cuDeviceGet
cuCtxCreate
cuCtxGetDevice
[10/07/2023-05:51:35] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Loading with <tensorrt.tensorrt.IExecutionContext object at 0x7f95fbd365f0>
cuStreamCreate
Size of the binding is -150528
Binding is  (-1, 3, 224, 224)
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:111: DeprecationWarning: Use get_tensor_dtype instead.
  dtype = engine.get_binding_dtype(binding)
Size of the binding is {size} with {binding} and {engine.max_batch_size}
Traceback (most recent call last):
  File "/home/sigmind/pycuda/examples/infer_all_class_FAN.py", line 114, in <module>
    device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
OverflowError: can't convert negative value to unsigned int
cuCtxPopCurrent
cuCtxPushCurrent
cuStreamDestroy
cuCtxPopCurrent
cuCtxPushCurrent
cuCtxDetach


I have upgraded the TensorRT version to 8.5.3.1 to match the TensorRT engine generated from TAO Toolkit. I have also upgraded to CUDA 12.0 and CUDNN 8.9.5.29.

The code previously runs fine with CUDA 11.8 and TensorRT 8.5.5.2 with different model trained with TAO 4 so I can assume the code is okey, but there might be some version incompatibility with the exported model. I have also tried to get the engine details which says

  tensor_dtype = engine.get_binding_dtype(binding_idx)
Binding Name: input_1
Tensor Shape: (-1, 3, 224, 224)
Data Type: DataType.FLOAT
Size (bytes): -602112
Binding Name: probs
Tensor Shape: (-1, 24)
Data Type: DataType.FLOAT
Size (bytes): -96

I am confused why the model is not being loaded in my program. Also I have tried to deploy the classification pytorch model directly to deepstream as SGIE classifier with the following spec file

[property]
gpu-id=0
net-scale-factor=1.0
offsets=103.939;116.779;123.68
model-color-format=1
infer-dims=3;224;224
network-type=1
num-detected-classes=24
uff-input-blob-name=input_1
maintain-aspect-ratio=0
output-tensor-meta=0
onnx-file=classification_model_export_urstp_9.onnx
labelfile-path=labels.txt
#int8-calib-file=mawa_pruned_int8_cache.bin
model-engine-file=classification_model_export_urstp_9.onnx_b2_gpu0_fp32.engine
batch-size=2
is-classifier=1
process-mode=2 ##1 Primary 2 Secondary
network-mode=0 ## 0=FP32, 1=INT8, 2=FP16 mode
interval=0
gie-unique-id=6
operate-on-class-ids=0;1;2;3;5;7
classifier-threshold=0.51
#classifier-async-mode=1

It loads the onnx model and convert to engine file but classification doesn’t work. In the original classification pytorch notebook, I have tested the engine and it works well.

Is there any compatibility issue with the Pytorch model trained with TAO-5? Any idea of solution?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Please run with tao deploy to check if your engine works. Then, you can leverage the inference code.
Source code:
https://github.com/NVIDIA/tao_deploy/tree/main/nvidia_tao_deploy/cv/classification_pyt/scripts.
docker: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-deploy

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.