• Hardware Platform (Jetson / GPU) RTX 4000
• DeepStream Version DS 6.2
• TensorRT Version 8.5.0
• NVIDIA GPU Driver Version (valid for GPU only) 525.x.x.x
• Issue Type( questions, new requirements, bugs) question
I have built C++ BLS backend and BLS works fine if I use python application with grpc call for inferencing but if I use deepstream app do so, It is getting crashed.

I see input tensor memory is different in both the scenario, in case of python app, I am getting memory type 0 for input tensor which indicates CPU but if I use deepstream app then getting memory type 2 for input tensor which indicates GPU memory.

How can I access input image (tensor) from GPU memory into BLS model. below is my pipeline structure into triton-server.

ensemble_model → DALI → TensorRT MODEL ->BLS

DeepStream app can only work as Triton client. How did you get the memory type and where?

Hi @Fiona.Chen ,

I have created C/C++ BLS model and try to use BLS model as part of ensemble model from deepstream as described above.

BLS model expects input tensor as part of CPU memory but I am getting it in GPU memory.

I have figured it out way to copy from GPU to CPU memory and now it is working fine.
I used CUDA stream to fetch data from GPU memory to CPU memory.

What is your nvinferserver configuration?

