Description
I am trying to deploy python backend models using triton inference server. Server launches successfully exposing the ports for HTTP, gRPC and Metrics. However, when trying to use the inference, only HTTP endpoint is working.
When trying to get gRPC, it throws status.UNAVAILABLE
error.
What i have tried:
checked for http endpoint :
httpclient.InferenceServerClient(url="httpendpoint_address:port").is_server_live()
This returns true.
But when i tried the same using
grpcclient.InferenceServerClient(url="grpcendpoint_address:port").is_server_live()
triton server is running out of a 23.05 ngc image, the model is based on python backend.
Environment
GPU Type: A100
Nvidia Driver Version: 560.35.03
CUDA Version: 12.6
CUDNN Version:
Operating System + Version: ubuntu 22.04
Baremetal or Container (if container which image + tag): dtr.f1.local:5030/nvidia/tritonserver:23.05-py3
Relevant Files
took motivation from : tutorials/HuggingFace at main · triton-inference-server/tutorials