"DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported"

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2
• TensorRT Version 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only) Driver Version: 525.78.01 CUDA Version: 12.0
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
When trying to convert a Pytorch tensor to dlpack in order to send it to the next model (Using Python backend, ensemble configuratrion) I use the following sequence:

import torch
from torch.utils.dlpack import from_dlpack, to_dlpack
import triton_python_backend_utils as pb_utils

class TritonPythonModel:
    """Your Python model must use the same class name. Every Python model
    that is created must have "TritonPythonModel" as the class name.

    def initialize(self, args):
        """ Some initialization"""

    def execute(self, requests):
        ## Some handling before 
        init_vector_tensor = torch.zeros((batch_size, 1, 1, 180), dtype=self.data_type, device=self.device).contiguous()
        previous_state_tensor = torch.zeros((batch_size, 128, 128, 180), dtype=self.data_type, device=self.device).contiguous()
        out_InitVector = pb_utils.Tensor.from_dlpack("InitVector", to_dlpack(init_vector_tensor))
        out_PreviousState = pb_utils.Tensor.from_dlpack("PreviousState", to_dlpack(previous_state_tensor))
        inference_response = pb_utils.InferenceResponse(
                output_tensors=[out_Image, out_InitVector, out_PreviousState]

        # Some handling after
        return responses

I get the following error:

ERROR: infer_trtis_server.cpp:268 Triton: TritonServer response error received., triton_err_str:Internal, err_msg:in ensemble 'ensemble_python_smoke_16', Failed to process the request(s) for model instance 'preprocess_16_0', message: TritonModelException: DLPack tensor is not contiguous. Only contiguous DLPack tensors that are stored in C-Order are supported.

If I use the numpy to dl_pack it works fine - but I want the full GPU optimaztion without copying…
What am I missing? looking throught Pytorch documentation - no such thing as C-Order memory format built in

Thanks ahead!

Full log:
test.log (41.6 KB)

For numpy we would make the copy by ourselves if the tensor is not contiguous. However, for DLPack you need to make sure that the tensor you’re providing is stored in a contiguous memory buffer. I think calling the .contiguous should’ve solved the issue. Do you see the error even after you’ve added .contiguous before converting the tensor to dlpack?

Hi @Fiona.Chen
It actually didn’t help - the attached code snippet includes calling contigous but it doesn’t help…

Curious about this as well. Ran into the same issue.

I was previously using a 1x1xNxHxD tensor, which resulting in the error regarding being non-contiguous despite using .contiguous(). Getting rid of the leading dimension of 1 made the error go away.

So how did you use batches?

This occurs in Pytorch version >1.12 only. I installed Torch v1.12 and works as intended. Nvidia? Thoughts?

I need to run batches, so getting rid of the batch dimension isn’t a workable solution for me. Just thought it would help diagnose the issue.

I’m facing the same error when converting a PyTorch tensor to DLPack (Triton v22.12). I used .contiguous() but it didn’t fix the issue.

Try to use cupy like below:

# tensor is Torch.Tensor on cuda
cupy.array(tensor.detach(), order="C").toDlpack()

It works for me