Triton did not update the model after users added a new model into model_repository | There is nothing on localhost:8000/api/status

The issue was posted on the Github TRTIS repo already, but it hasn’t been solved. here

Description

Triton inference server of version 21.03 did not update the model after users added a new model into model_repository.

I have used the old Triton version (20.03) for a long time, and everything was working well. However, as old versions of TRTIS cannot support the GPU of Ampere arch; hence, I switched to a newer version of TRTIS (21.03).

I could successfully launch the TRTIS, and I also saw the status of output which showed “READY” as below:

I0415 02:43:44.019434 1 server.cc:570] 
+-------------------------------+---------+--------+
| Model                         | Version | Status |
+-------------------------------+---------+--------+
| densenet_onnx                 | 1       | READY  |
+-------------------------------+---------+--------+

I checked this command as well:

$ curl -v localhost:8000/v2/health/ready
*   Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host localhost left intact

The weird thing is that there was nothing by this command below:

$curl localhost:8000/api/status

I think normally it should be print the information of model, no?
Whereas, I cannot see any information.

In addition, in order to test whether Triton can update the model immediately, so I move the model folder outside of model_repository (Or move back to the model_repository), but it did not happen anything. In general, it should print something like “unload” or “Load” the model information on the terminal.

Although I cannot check the model information via localhost:8000/api/status, I can implement the example and get the results that it can prove the model which can load correctly initially and server works)

Example output:

$ /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
    15.349566 (504) = COFFEE MUG
    13.227467 (968) = CUP
    10.424896 (505) = COFFEEPOT

Triton Information

My tirton version was using 21.03 and this was pulled from NGC.

To Reproduce

docker run --rm --gpus all \
     --shm-size=1g \
    --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    -p 8000:8000 -p 8001:8001 -p 8002:8002 \
    --name trt_serving2103_server \
    -v /hoem/model_repository:/models \
    nvcr.io/nvidia/tritonserver:21.03-py3 \
    tritonserver --model-repository=/models

model was using densenet_onnx which was downloaded from example.

Expected behavior

  1. I can check the information via localhost:8000/api/status
  2. It can change or update the model without closing the server. (This is very important feature for TRTIS.)

Resolved the problem of updating the model without closing the server:

--model-control-mode="poll"

However, the problem of using localhost:8000/api/status still cannot work.

Example:

curl localhost:8000/v2/models/mnist/versions/1/stats

mnist is your model folder name.

But this one is not able to preview all models’ status directly.

Reference: