Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only) Driver Version: 525.78.01 CUDA Version: 12.0
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
When trying to convert a Pytorch tensor to dlpack in order to send it to the next model (Using Python backend, ensemble configuratrion) I use the following sequence:
import torch from torch.utils.dlpack import from_dlpack, to_dlpack import triton_python_backend_utils as pb_utils class TritonPythonModel: """Your Python model must use the same class name. Every Python model that is created must have "TritonPythonModel" as the class name. """ def initialize(self, args): """ Some initialization""" pass def execute(self, requests): ## Some handling before init_vector_tensor = torch.zeros((batch_size, 1, 1, 180), dtype=self.data_type, device=self.device).contiguous() previous_state_tensor = torch.zeros((batch_size, 128, 128, 180), dtype=self.data_type, device=self.device).contiguous() out_InitVector = pb_utils.Tensor.from_dlpack("InitVector", to_dlpack(init_vector_tensor)) out_PreviousState = pb_utils.Tensor.from_dlpack("PreviousState", to_dlpack(previous_state_tensor)) inference_response = pb_utils.InferenceResponse( output_tensors=[out_Image, out_InitVector, out_PreviousState] ) # Some handling after return responses
I get the following error:
ERROR: infer_trtis_server.cpp:268 Triton: TritonServer response error received., triton_err_str:Internal, err_msg:in ensemble 'ensemble_python_smoke_16', Failed to process the request(s) for model instance 'preprocess_16_0', message: TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.
If I use the numpy to dl_pack it works fine - but I want the full GPU optimaztion without copying…
What am I missing? looking throught Pytorch documentation - no such thing as C-Order memory format built in
test.log (41.6 KB)