Pycuda :: OverflowError: can't convert negative value to unsigned int

neuroSparK · October 5, 2023, 2:02pm

I have trained a classification model with pytorch backend in TAO Toolkit 5.0 and generated TensorRT engine. When running inference with the engine in PyCUDA with the following code:

# Load the TRT engine
engine_file = '/home/nvidia/pycuda/FAN/classification_model_export_9.engine'
with open(engine_file, 'rb') as f, trt.Runtime(trt.Logger()) as runtime:
    engine_data = f.read()
    engine = runtime.deserialize_cuda_engine(engine_data)

# Create the context and allocate memory
context = engine.create_execution_context()
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

for binding in engine:
    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
    dtype = engine.get_binding_dtype(binding)
    # Allocate device memory for inputs/outputs
    device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
    # Append to the appropriate list
    if engine.binding_is_input(binding):
        inputs.append(device_mem)
    else:
        outputs.append(device_mem)
    bindings.append(int(device_mem))

# Load the label file
label_file = '/home/nvidia/pycuda/FAN/labels.txt'
with open(label_file, 'r') as f:
    labels = f.read().splitlines()

print(labels)


def preprocess_image(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    resized_image = cv2.resize(image, (224, 224)).astype(np.float32)
    resized_image -= np.array([103.939, 116.779, 123.68], dtype=np.float32)
    resized_image = np.transpose(resized_image, (2, 0, 1))
    resized_image = np.expand_dims(resized_image, axis=0)
    return resized_image


def infer_image(image):
    print("Inferencing started")
    # Copy input image to device
    cuda.memcpy_htod_async(inputs[0], image.ravel(), stream)

    # Run inference
    context.execute_async(bindings=bindings, stream_handle=stream.handle)

    # Synchronize the stream
    stream.synchronize()

    # Get the output label
    output = np.empty(trt.volume(engine.get_binding_shape(engine[engine.num_bindings - 1])), dtype=np.float32)  # Output shape
    cuda.memcpy_dtoh_async(output, outputs[0], stream)
    cuda.memcpy_dtoh(output, outputs[0])

    # Get the predicted label
    label_id = np.argmax(output)
    print(label_id)

    # Return the predicted label
    return labels[label_id]



def classify_image(input_image):


    image = cv2.imread(input_image)
    

    # Preprocess the image
    preprocessed_image = preprocess_image(image)
    

    # Run inference on the preprocessed image
    predicted_label = infer_image(preprocessed_image)
    print("Predicted = ", predicted_label)
    
    return predicted_label

I get the following error:

[10/05/2023-19:54:40] [TRT] [E] 1: [raiiMyelinGraph.h::RAIIMyelinGraph::24] Error Code 1: Myelin (Compiled against cuBLASLt 11.11.3.0 but running against cuBLASLt 11.10.3.0.)
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:101: DeprecationWarning: Use get_tensor_shape instead.
  size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:101: DeprecationWarning: Use network created with NetworkDefinitionCreationFlag::EXPLICIT_BATCH flag instead.
  size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
[10/05/2023-19:54:40] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
/home/nvidia/pycuda/examples/infer_all_class_FAN.py:102: DeprecationWarning: Use get_tensor_dtype instead.
  dtype = engine.get_binding_dtype(binding)
Traceback (most recent call last):
  File "/home/nvidia/pycuda/examples/infer_all_class_FAN.py", line 104, in <module>
    device_mem = cuda.mem_alloc(size * trt.float32.itemsize)
OverflowError: can't convert negative value to unsigned int

I have upgraded the TensorRT version to 8.5.3.1 to match the TensorRT engine generated from TAO Toolkit. I have also upgraded to CUDA 12.0 and CUDNN 8.9.5.29.

The code previously runs fine with CUDA 11.8 and TensorRT 8.5.5.2 with different model trained with TAO 4 so I can assume the code is okey, but there might be some version incompatibility with the exported model. The error Compiled against cuBLASLt 11.11.3.0 but running against cuBLASLt 11.10.3.0 gives an insight but I’m not sure how to check the cuBLAS version or how to change the version. Please help.

spolisetty · November 15, 2023, 1:36pm

Please refer to the following similar issue, which may help you.

github.com/NVIDIA/TensorRT

Error Code 1: Myelin (Compiled against cuBLASLt 10.2.2.0 but running against cuBLASLt 11.4.2.0.)

opened 06:22AM - 23 Apr 22 UTC

closed 06:09PM - 29 Apr 22 UTC

IamNaQi

API: Python Framework: PyTorch triaged Myelin failure: cublas/cublaslt

Hi I am using TensorRT for an image detection in python but getting this issue. …we have tested this on Linux and working well but got issues on windows. This is issue of only getting on python , [C++ inference ]( https://github.com/zhiqwang/yolov5-rt-stack/tree/main/deployment/tensorrt)working smoothly. I am also attaching yoloV5n6.trt.onnx file for further tests. Please help me out in this Error. #### Yolort ONNX file [yolov5n6.trt.zip](https://github.com/NVIDIA/TensorRT/files/8546385/yolov5n6.trt.zip) **I am using Yolort to infer image.** [https://github.com/zhiqwang/yolov5-rt-stack](url) ``` import torch from yolort.runtime import PredictorTRT # Load the serialized TensorRT engine engine_path = "yolov5n6.engine" device = torch.device("cuda") y_runtime = PredictorTRT(engine_path, device=device) # Perform inference on an image file predictions = y_runtime.predict("new_york.jpg") print(predictions) ``` **Here is my environment** ``` >python -m torch.utils.collect_env Collecting environment information... PyTorch version: 1.11.0+cu113 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A OS: Microsoft Windows 10 Home GCC version: Could not collect Clang version: Could not collect CMake version: version 3.23.0 Libc version: N/A Python version: 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.19041-SP0 Is CUDA available: True CUDA runtime version: 11.6.124 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU Nvidia driver version: 511.65 cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\cudnn_ops_train64_8.dll HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] numpy==1.21.6 [pip3] torch==1.11.0+cu113 [pip3] torchaudio==0.11.0+cu113 [pip3] torchvision==0.12.0+cu113 [conda] blas 1.0 mkl [conda] cudatoolkit 11.3.1 h59b6b97_2 [conda] libblas 3.9.0 12_win64_mkl conda-forge [conda] libcblas 3.9.0 12_win64_mkl conda-forge [conda] liblapack 3.9.0 12_win64_mkl conda-forge [conda] mkl 2021.4.0 h0e2418a_729 conda-forge [conda] mkl-service 2.4.0 py39h6b0492b_0 conda-forge [conda] mkl_fft 1.3.1 py39h0cb33c3_1 conda-forge [conda] mkl_random 1.2.2 py39h2e25243_0 conda-forge [conda] mypy_extensions 0.4.3 py39hcbf5309_5 conda-forge [conda] numpy 1.22.3 pypi_0 pypi [conda] numpy-base 1.20.3 py39hc2deb75_0 [conda] numpydoc 1.2.1 pyhd8ed1ab_2 conda-forge [conda] pytorch 1.11.0 py3.9_cuda11.3_cudnn8_0 pytorch [conda] pytorch-mutex 1.0 cuda pytorch [conda] torchaudio 0.11.0 py39_cu113 pytorch [conda] torchvision 0.12.0 py39_cu113 pytorch ``` ## Error ![image](https://user-images.githubusercontent.com/76849182/163939405-8de7b9e3-a15d-4563-9ede-af71a70bc1f8.png)

neuroSparK · November 19, 2023, 6:22pm

I have ended up the pycuda and successfully using nvidia tao_deploy for pytorch based models

system · December 3, 2023, 6:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
TAO exported Classification Pytorch model not woking :: engine binding size negative TAO Toolkit	2	469	October 8, 2023
TensorRT ERROR: pointWiseV2Helpers.h::launchPwgenKernel::532 Cuda Driver (invalid resource handle) Jetson Xavier NX tensorrt , cuda , jetson-inference	3	2055	March 24, 2022
pycuda._driver.LogicError: cuStreamSynchronize failed: an illegal memory access was encountered TensorRT	1	1094	September 3, 2021
Cuda initialization failure when converting trt model with different GPU TensorRT tensorrt	7	6426	September 28, 2022
Convert tensorrt engine from version 7 to 8 TAO Toolkit tensorrt	67	4366	October 12, 2021
Can't load trt engine and throwing an instance of 'nvinfer1::MyelinError' TAO Toolkit	17	2730	October 12, 2021
Doing inference in python with YOLO V4 in TensorRT - postporsessing TAO Toolkit yolo	7	3348	October 12, 2021
Error Code 1: Cask (Cask convolution execution) TensorRT tensorrt , cuda	3	1572	March 4, 2024
TensorRT Engine Model is not working correctly TensorRT	4	1294	August 22, 2024
Struggling to get model onto tensorrt TensorRT tensorrt , cuda , pytorch	2	650	March 30, 2024

Pycuda :: OverflowError: can't convert negative value to unsigned int

Related topics