PyTorch CUDA tensors as TRT engine bindings

rneven · April 20, 2021, 12:14pm

Description

I want to do inference with a TensorRT engine on PyTorch GPU tensors. However, using the code below, if I create the tensors after I have created my execution context, I get the following error:

import tensorrt as trt
import torch
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger(trt.Logger.INFO)

with open("model.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime, runtime.deserialize_cuda_engine(f.read()) as engine, engine.create_execution_context() as context:

    output_buffer = cuda.mem_alloc(4*288*768*4)
    stream = cuda.Stream()

    for i in range(1):
        tensor = torch.randn((4, 288, 768, 4), dtype=float, device=torch.device('cuda'))        
        context.execute_async_v2(bindings=[int(tensor.data_ptr()), int(output_buffer)], 
                      stream_handle=stream.handle)
        stream.synchronize()

TensorRT] ERROR: ../rtExt/cuda/cudaGatherRunner.cpp (111) - Cuda Error in execute: 400 (invalid resource handle)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

If I make the tensor before I create the execution context, there are no errors.

import tensorrt as trt
import torch
import pycuda.driver as cuda
import pycuda.autoinit

TRT_LOGGER = trt.Logger(trt.Logger.INFO)

tensor = torch.randn((4, 288, 768, 4), dtype=float, device=torch.device('cuda'))

with open("model.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime, runtime.deserialize_cuda_engine(f.read()) as engine, engine.create_execution_context() as context:
    
    output_buffer = cuda.mem_alloc(4*288*768*4)
    stream = cuda.Stream()

    for i in range(1):                
        context.execute_async_v2(bindings=[int(tensor.data_ptr()), int(output_buffer)], stream_handle=stream.handle)
        stream.synchronize()

Is there any way to create a TRT engine and then perform inference on PyTorch tensors that are created after the execution context? I assume it has to do with CUDA contexts?

Environment

TensorRT Version: 7.2:
GPU Type: Quadro RTX 3000:
Nvidia Driver Version: 460.56:
CUDA Version 11.1:
CUDNN Version:
Operating System + Version:
Python Version: 3.6:
TensorFlow Version (if applicable):
PyTorch Version: 1.8:

NVES · April 20, 2021, 6:50pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

rneven · April 20, 2021, 7:19pm

Thanks for replying. I noticed it works by pushing and popping a cuda context. However, I found out that two different scenario’s work, but I have to say I actually don’t know why because of my limited experience with cuda contexts.

Scenario one: pushing/popping cuda context before/after tensor creation.

import tensorrt as trt
import torch
import pycuda.autoinit
import pycuda.driver as cuda

#create cuda context
ctx = cuda.Device(0).make_context()

TRT_LOGGER = trt.Logger(trt.Logger.INFO)

with open("model.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime, runtime.deserialize_cuda_engine(f.read()) as engine, engine.create_execution_context() as context:

    output_buffer = cuda.mem_alloc(8*288*768*4)
    stream = cuda.Stream()

    for i in range(10):
        ctx.push()
        tensor = torch.ones((8, 288, 768, 4), dtype=torch.float,
                        device=torch.device('cuda'))
        ctx.pop()

        context.execute_async_v2(bindings=[int(tensor.data_ptr()), int(
                  output_buffer)], stream_handle=stream.handle)
        stream.synchronize()


ctx.pop()
exit()

Scenario two: pushing/popping cuda context before/after TRT engine inference.

import tensorrt as trt
import torch
import pycuda.autoinit
import pycuda.driver as cuda

#create cuda context
ctx = cuda.Device(0).make_context()

TRT_LOGGER = trt.Logger(trt.Logger.INFO)

with open("model.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime, runtime.deserialize_cuda_engine(f.read()) as engine, engine.create_execution_context() as context:

    output_buffer = cuda.mem_alloc(8*288*768*4)
    stream = cuda.Stream()

    for i in range(10):
    
        tensor = torch.ones((8, 288, 768, 4), dtype=torch.float,
                        device=torch.device('cuda'))
    

        ctx.push()
        context.execute_async_v2(bindings=[int(tensor.data_ptr()), int(
                   output_buffer)], stream_handle=stream.handle)
        stream.synchronize()
        ctx.pop()


ctx.pop()

spolisetty · April 21, 2021, 6:29pm

Hi @rneven,

Looks like you’re using both PyTorch and PyCUDA. We recommend you to use PyTorch device tensors directly and drop PyCUDA completely. It would be better to avoid PyCUDA if you’re using torch . PyTorch also includes various CUDA APIs.

Thank you.

Topic		Replies	Views
Context = engine.create_execution_context() context=None TensorRT	1	281	May 30, 2024
TensorRT engine and inference in existing CUDA context TensorRT tensorrt	0	471	November 17, 2020
Tensorrt inference with pytorch tensor(data_ptr) TensorRT tensorrt , cuda , pytorch	2	1953	June 11, 2021
How does CUcontext affect TRT Inference? TensorRT	3	989	October 1, 2020
AttributeError: 'TensorRTBackendRep' object has not attribute 'create_execution_context' TensorRT jetson-inference , cudnn	1	349	March 14, 2024
TensorRT deserialize_cuda_engine() returns a None Object TensorRT tensorrt	7	3804	October 12, 2021
AttributeError: 'NoneType' object has no attribute 'create_execution_context' TensorRT tensorrt , python	1	629	February 11, 2021
AttributeError: 'NoneType' object has no attribute 'create_execution_context' TensorRT	30	22866	June 17, 2023
Can multiple CUDA contexts share an inference engine? TensorRT tensorrt , cuda	3	190	January 21, 2025
Engine.create_execution_context() is resulting in segmentation fault Jetson Orin Nano tensorrt	9	1276	October 26, 2023

PyTorch CUDA tensors as TRT engine bindings

Description

Environment

check_model.py

Related topics