Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only) Driver Version: 525.78.01 CUDA Version: 12.0
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
When trying to convert a Pytorch tensor to dlpack in order to send it to the next model (Using Python backend, ensemble configuratrion) I use the following sequence:
import torch
from torch.utils.dlpack import from_dlpack, to_dlpack
import triton_python_backend_utils as pb_utils
class TritonPythonModel:
"""Your Python model must use the same class name. Every Python model
that is created must have "TritonPythonModel" as the class name.
"""
def initialize(self, args):
""" Some initialization"""
pass
def execute(self, requests):
## Some handling before
init_vector_tensor = torch.zeros((batch_size, 1, 1, 180), dtype=self.data_type, device=self.device).contiguous()
previous_state_tensor = torch.zeros((batch_size, 128, 128, 180), dtype=self.data_type, device=self.device).contiguous()
out_InitVector = pb_utils.Tensor.from_dlpack("InitVector", to_dlpack(init_vector_tensor))
out_PreviousState = pb_utils.Tensor.from_dlpack("PreviousState", to_dlpack(previous_state_tensor))
inference_response = pb_utils.InferenceResponse(
output_tensors=[out_Image, out_InitVector, out_PreviousState]
)
# Some handling after
return responses
I get the following error:
ERROR: infer_trtis_server.cpp:268 Triton: TritonServer response error received., triton_err_str:Internal, err_msg:in ensemble 'ensemble_python_smoke_16', Failed to process the request(s) for model instance 'preprocess_16_0', message: TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.
If I use the numpy to dl_pack it works fine - but I want the full GPU optimaztion without copying…
What am I missing? looking throught Pytorch documentation - no such thing as C-Order memory format built in
Thanks ahead!
Full log:
test.log (41.6 KB)