Does triton inference server: python backend with decoupled mode works with nvinferserver

ajithkumar.ak95 · May 21, 2024, 11:31am

This is my configuration

• x86 Machine with dGPU

• DeepStream Version : deepstream-6.3

• TensorRT Version: 8.6.1.

• NVIDIA GPU Driver Version (valid for GPU only) : Driver Version: 535.86.10 CUDA Version: 12.2

• Issue Type: questions

• USE_NEW_NVSTREAMMUX=yes

I am using a python backed as triton inference server, for performance reason, I need to see if decoupled model and async mode will works in deepstream using Gst-nvinferserver and connected triton pb model. (Gst-nvinferserver — DeepStream documentation 6.4 documentation)

Please advice.

fanzh · May 21, 2024, 2:59pm

decoupled model inference is a feature of triton. please refer to this doc.
pb model pb model is a feature of triton. please refer to the sample \opt\nvidia\deepstream\deepstream\samples\triton_model_repo\inception_graphdef\config.pbtxt.
about async model, nvinferserer can support sgie async_mode model. please find async_mode in the doc.

ajithkumar.ak95 · May 21, 2024, 3:38pm

Thanks @fanzh for quick reply

I was trying to implement this using Triton python backend with deepstream nvinferserver

I changed some part of code, such that instead of random numbers in the request, it will be batch of frames as tensors, from my deepstream pipeline.
I am getting errors.

Error says that

ERROR: infer_trtis_server.cpp:268 Triton: TritonServer response error received., triton_err_str:Internal, err_msg:Python model 'centerface_0' is using the decoupled mode and the execute function must return None.
ERROR: infer_trtis_backend.cpp:629 Triton server failed to parse response with request-id:0 model:
ERROR: infer_trtis_backend.cpp:372 failed to specify dims after running inference failed on model:centerface, nvinfer error:NVDSINFER_TRITON_ERROR
0:00:12.139632808    29      0x39a4ca0 ERROR          nvinferserver gstnvinferserver.cpp:408:gst_nvinfer_server_logger:<primary-inference> nvinferserver[UID 1]: Error in specifyBackendDims() <infer_trtis_context.cpp:204> [UID = 1]: failed to specify input dims triton backend for model:centerface, nvinfer error:NVDSINFER_TRITON_ERROR

This is content of model.py in python backend

    async def execute(self, requests):
        processed_requests = []
        async_tasks = []
        for request in requests:
            frame_tensors = pb_utils.get_input_tensor_by_name(
                request, "INPUT0"
            ).as_numpy()
            for frame_tensor in frame_tensors:
                frame = frame_tensor[0]
                # if frame < 0:
                #     self.raise_value_error(requests)
                async_tasks.append(asyncio.create_task(asyncio.sleep(1)))
            processed_requests.append(
                {
                    "response_sender": request.get_response_sender(),
                    "batch_size": frame_tensors.shape[0],
                }
            )

        # This decoupled execute should be scheduled to run in the background
        # concurrently with other instances of decoupled execute, as long as the event
        # loop is not blocked.
        await asyncio.gather(*async_tasks)

        for p_req in processed_requests:
            response_sender = p_req["response_sender"]
            batch_size = p_req["batch_size"]

            stats = np.array([[10,10,100,100,3]])

            stats = np.tile(stats, (batch_size, 1,1))
            # logger.log_warn(f"{stats.shape}")
            stats = stats.astype(np.float32)
            shape = np.array([int(stats.shape[1]),int(stats.shape[2])])
            shape = np.tile(shape, (batch_size, 1,1)) # batch size
            shape = shape.astype(np.float32)
            out_tensor = pb_utils.Tensor("OUTPUT0", stats)
            out_tensor2 = pb_utils.Tensor("OUTPUT1", shape)
            # responses.append(pb_utils.InferenceResponse([out_tensor,out_tensor2]))

            # output_tensors = pb_utils.Tensor(
            #     "OUTPUT0", np.array([0 for i in range(batch_size)], np.float32)
            # )
            response = pb_utils.InferenceResponse(output_tensors=[out_tensor,out_tensor2])
            response_sender.send(
                response, flags=pb_utils.TRITONSERVER_RESPONSE_COMPLETE_FINAL
            )
        print('B Here I AM')

        return None

    def raise_value_error(self, requests):
        # TODO: Model may raise exception without sending complete final
        for request in requests:
            response_sender = request.get_response_sender()
            response_sender.send(flags=pb_utils.TRITONSERVER_RESPONSE_COMPLETE_FINAL)
        raise ValueError("wait_secs cannot be negative")

And this is config pbtxt

name: "centerface"
backend: "python"
  max_batch_size: 4

input [
  {
    name: "INPUT0"
    data_type: TYPE_FP32
    dims: [ 480, 640,3 ]
  }
]

output [
  {
    name: "OUTPUT0"
    data_type: TYPE_FP32
    dims: [ -1 ]
  }
]
output [
  {
    name: "OUTPUT1"
    data_type: TYPE_FP32
    dims: [ -1 ]
  }
]


instance_group [
  {
    count: 1
    kind : KIND_CPU
  }
]

parameters: {
  key: "FORCE_CPU_ONLY_INPUT_TENSORS"
  value: {
    string_value:"yes"
  }
}
model_transaction_policy { decoupled: true }

Also by pb, I meant python backend, not protbuf.

Please let me know how to fix the issue.

fanzh · May 23, 2024, 6:25am

ajithkumar.ak95:

ERROR: infer_trtis_server.cpp:268 Triton: TritonServer response error received., triton_err_str:Internal, err_msg:Python model 'centerface_0' is using the decoupled mode and the execute function must return None.

this error is from triton code. please correct the code “return None” you shared.

ajithkumar.ak95 · May 23, 2024, 12:05pm

I already return None

fanzh · May 24, 2024, 6:04am

about python backend code of decoupled model, please refer to this triton sample , seems response_sender.send is in another thread.
please refer to the code “TODO Decoupled Streaming support later.” in TrtISServer::InferComplete of opt\nvidia\deepstream\deepstream\sources\libs\nvdsinferserver\infer_trtis_server.cpp. currently Decoupled Streaming is not supported in nvinfersever.

ajithkumar.ak95 · May 24, 2024, 6:06am

Thank you @fanzh
I used the thread based async code, it is working.
Seems nvinferserver doesnt work with asyncio module.

fanzh · May 29, 2024, 2:35am

what is your decoupled model used to do? could you share the link if it is a public model? Thanks!

ajithkumar.ak95 · May 29, 2024, 3:10am

It is a triton inference server written in python, where a computationally expensive closed source algorithm is running. The decoupled model will give results once every ~200ms, but we need to draw bounding boxes on every frame, but with out using nvtracker. So for syncing of overlays to the correct frame, assumed solution was to use decupled model and then at later stages of pipeline use a probe do some overlay manipulation.

system · June 12, 2024, 6:13am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it posssible to connect deepstream with tritonserver directly without using grpc or http/rest? DeepStream SDK python , deepstream	6	38	October 11, 2024
NvInferServer asynchronous mode is applicable for secondaryclassifiers only. Turning off asynchronous mode DeepStream SDK	7	758	November 8, 2022
Deepstream nvinfer-server with https endpoint TensorRT	1	360	August 30, 2021
Using nvinfer(TensorRT) or nvinferserver on deepstream DeepStream SDK	4	951	October 5, 2021
DeepStream Gst-nvinferserver features to run triton inference server DeepStream SDK	3	339	January 9, 2024
Utilizing Inference server for multi-batch processing with deepstream DeepStream SDK gstreamer , inference-server-triton , deepstream61	11	1199	October 19, 2023
Deepstream nvinfer-server with https endpoint Triton Inference Server (archived)	1	679	September 30, 2021
NvInferserver Triton returning wrong results in model randomly DeepStream SDK	2	223	March 1, 2024
Regarding when we execute triton server on jetson orin getting an error unable to load model DeepStream SDK cuda	19	925	July 30, 2024
Deploying Models from TensorFlow Model Zoo Using NVIDIA DeepStream and NVIDIA Triton Inference Server Technical Blog	13	1212	May 25, 2022

Does triton inference server: python backend with decoupled mode works with nvinferserver

Related topics