The issue was posted on the Github TRTIS repo already, but it hasn’t been solved. here
Description
Triton inference server of version 21.03 did not update the model after users added a new model into model_repository.
I have used the old Triton version (20.03) for a long time, and everything was working well. However, as old versions of TRTIS cannot support the GPU of Ampere arch; hence, I switched to a newer version of TRTIS (21.03).
I could successfully launch the TRTIS, and I also saw the status of output which showed “READY” as below:
I0415 02:43:44.019434 1 server.cc:570]
+-------------------------------+---------+--------+
| Model | Version | Status |
+-------------------------------+---------+--------+
| densenet_onnx | 1 | READY |
+-------------------------------+---------+--------+
I checked this command as well:
$ curl -v localhost:8000/v2/health/ready
* Trying 127.0.0.1:8000...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
<
* Connection #0 to host localhost left intact
The weird thing is that there was nothing by this command below:
$curl localhost:8000/api/status
I think normally it should be print the information of model, no?
Whereas, I cannot see any information.
In addition, in order to test whether Triton can update the model immediately, so I move the model folder outside of model_repository (Or move back to the model_repository), but it did not happen anything. In general, it should print something like “unload” or “Load” the model information on the terminal.
Although I cannot check the model information via localhost:8000/api/status
, I can implement the example and get the results that it can prove the model which can load correctly initially and server works)
Example output:
$ /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg
Request 0, batch size 1
Image '/workspace/images/mug.jpg':
15.349566 (504) = COFFEE MUG
13.227467 (968) = CUP
10.424896 (505) = COFFEEPOT
Triton Information
My tirton version was using 21.03 and this was pulled from NGC.
To Reproduce
docker run --rm --gpus all \
--shm-size=1g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
--name trt_serving2103_server \
-v /hoem/model_repository:/models \
nvcr.io/nvidia/tritonserver:21.03-py3 \
tritonserver --model-repository=/models
model was using densenet_onnx
which was downloaded from example.
Expected behavior
- I can check the information via
localhost:8000/api/status
- It can change or update the model without closing the server. (This is very important feature for TRTIS.)