We are trying to use Clara Deploy SDK to run our own model written in Pytorch. Similar to reference workflow like liver_segmentation, first we will try by placing the input dicom images in the input folder and looking to generate the output in clara-reference-app/output folder. My question is regarding importing the model (Pytorch code). I understand that for reference workflow, the model is from NVIDIA TLT toolkit, is it possible to have our own model configured? Can you please guide/walk us through on how our model written in Pytorch can be made use of in Clara Deploy?
Currently, I have a Jupyter notebook with Pytorch code. Should I upload the model to registry? Just curious to kind of try and figure out.
Can anyone guide us on how this can be done? It would be helpful.
Hello, Can anyone kind of walkusthrough on this? Any guides/documentation on how to use our model in clara deploy would be helpful. Might be an elementary question, but I am trying to learn through this task.
Clara Deploy SDK uses TRT inference server in the background to serve models.
TRT Inference Server supports saved models, graph definitions, caffe netdefs and TRT plans (https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/index.html?highlight=netdef#features). For PyTorch, you will need to convert the model to a caffe netdef, or convert it to ONNX then one of the aforementioned formats.
Alternatively, if you don’t want to use TRTIS, you can build the model serving function inside the application container.
Thank you Alvin for the response. I am currently going through this and will reach out to you for any issues. Guess the link above is the only resource. Is there any other simple example that lists the end-end process?
I tried starting the inference server with example model repository but encountered the below error? Can you please let me know what is the issue? I guess it is something do with model format? Please find below the logs and look for the last highlighted line.
I executed the below command to start the tensort server (19.05)
nvidia-docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /home/selva/claradeploy/trtserver/tensorrt-inference-server/docs/examples/model_repository:/models $docker_image trtserver --model-store=/models
I have created a variable called docker_image=“nvcr.io/nvidia/tensorrtserver:19.05-py3”
Once, I executed it was fine with messages but threw error in the last line (error meaning it didn’t enter the container and i was also not able to see this container running)
I0611 02:47:05.672819 1 main.cc:267] Starting endpoints, 'inference:0' listening on
I0611 02:47:05.672905 1 main.cc:271] localhost:8001 for gRPC requests
I0611 02:47:05.673079 1 grpc_server.cc:265] Building nvrpc server
I0611 02:47:05.673097 1 grpc_server.cc:272] Register TensorRT GRPCService
I0611 02:47:05.673118 1 grpc_server.cc:275] Register Infer RPC
I0611 02:47:05.673125 1 grpc_server.cc:279] Register StreamInfer RPC
I0611 02:47:05.673131 1 grpc_server.cc:284] Register Status RPC
I0611 02:47:05.673136 1 grpc_server.cc:288] Register Profile RPC
I0611 02:47:05.673142 1 grpc_server.cc:292] Register Health RPC
I0611 02:47:05.673148 1 grpc_server.cc:304] Register Executor
I0611 02:47:05.680751 1 main.cc:282] localhost:8000 for HTTP requests
I0611 02:47:05.722755 1 main.cc:294] localhost:8002 for metric reporting
I0611 02:47:05.725667 1 metrics.cc:149] found 1 GPUs supporting NVML metrics
I0611 02:47:05.731615 1 metrics.cc:158] GPU 0: TITAN Xp
I0611 02:47:05.732386 1 server.cc:243] Initializing TensorRT Inference Server
<b>E0611 02:47:05.743645 1 server.cc:296] unexpected platform type onnxruntime_onnx for densenet_onnx</b> # this is the line which caught my attention
Is this an expected behavior? Because of which the curl command to localhost:8000 fails.
I was expecting that the command will take me to the prompt of the container (line which starts with #)
- When I tried 19.06 instead of 19.05 - I get an error that
“manifest for nvcr.io/nvidia/tensorrtserver:19.06-py3 not found”
Does 19.06 exists in NGC?
Can you please let me know what would be the issue?
Can anyone help us with this issue? would be helpful
You error indicates that TensorRT Inference Server does not support the model type you are trying to deploy.
This link https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#framework-model-definition lists the platforms supported as
- model.plan for TensorRT models
- model.graphdef for TensorFlow GraphDef models
- model.savedmodel for TensorFlow SavedModel models
- model.netdef and init_model.netdef for Caffe2 Netdef models
ONNX models aren’t supported (https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_repository.html#onnx-models).
Did you put
onnxruntime_onnx in your config.pbtxt under
platform? This would be incorrect. The allowed for
platform values are
Am only running the example models provided provided in the docs. Aren’t they supported as well? I mean am using the models from docs/examples/. I mean example repository which is mentioned in the doc. I thought that would be supported in tensor rt server.
Don’t use the master branch as it may contain features that are not available in the container version.
You have two choices:
- You can continue with the master branch, but in that case you would have to build the docker image locally (see https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/build.html) after checking out the code from the master branch. This would allow you to access newer features such as onnx model deployments.
- Use the TRT Inference Server version in NGC (19.05). In that case only base yourself on the examples listed in the 19.05 branch (https://github.com/NVIDIA/tensorrt-inference-server/tree/r19.05/docs/examples)
You cannot combine the images in NGC with the master branch examples; you would have to look at the same branch in github as the image tag in NGC.
Does that help?
Thanks for the response. I already followed only r19.05 branch. But I believe, I shouldn’t be running the fetch_models.sh file. Because r19.05 be default doesn’t contain onnx model. but it’s that file which pulls all the models from repository.
After removing that model from the repository, I was able to start the server.
I did run few example models.
Am I right to understand that models in trtserver already has been trained on certain images? And they aren’t trained on medical images? Or what was the training data for those models?
Where/How can I see the code for these models? I mean are they public? I was able to see the codes for clients, but couldn’t find this model info
let’s say I have my own model written in Pytorch. Can you confirm my steps below
I have to convert them to one of 3 formats which we discussed
To use our model in claradeploy, it should be uploaded to NGC? Is there any process for it? Any one can upload it?
We can use the same clients that you have in the repository to make calls to our model. Am I right?
But to understand it better. For ex: we use PACS server like Orthanc which has an option to select the modality either clara-ctseg or clara-liverseg. So, in this case, Orthanc acts as a client and makes call to the clara-ctseg/clara-liverseg modalities which in turn requests corresponding models in trtserver. Am I right?
Considering that we would like to use existing viewers, then we only have to configure our image viewer to clara deploy and don’t have to build any clients?
Can you correct me if my understanding isn’t correct? Thanks for your time
If you use the
fetch_models.sh from R19.05 it should download only netdef, graphdef (supported models). Later versions of the script will download onnx models. That is, if you check out the r19.05 branch you will get only models supported for that branch.
Yes they are pre-trained, but not in medical images. TRTIS is used for any inference work. You can check what type and size of images you can send to that model by looking at the
config.pbtxt file in that model folder. These are only example models to test your setup etc., and aren’t meant to be used for actual inference work. Pre-trained models for medical imaging can be obtained from the transfer learning toolkit included with Clara Train SDK.
I’m not sure the ask here. Can you elaborate? The models are just examples of models you can deploy in TRTIS. Again, they are not meant for you to perform actual inference work.
3.2) No you don’t upload the model to NGC. You put it in the
models folder that TRT Inference
Server can access. NGC is not a cloud platform, it is a repository for NVIDIA optimized containers.
3.3) Not sure I understand what you mean here.
3.4) Yes, Orthanc will perform a push request to the Clara Application with the AE Title you have assigned to, say, liver segmentation. In turn, the Clara Application Container will make a call to the TRTIS instance where you have deployed your model.
3.5) If your DICOM viewer works as a DICOM server/client (most do), then you can configure the viewer to send a push request to the AE Title configured in the Clara DICOM adapter, and configure Clara Deploy SDK destination endpoints to add your dicom viewer so it pushes the secondary dicoms (segmntations) back to the viewer. Alternatively, you can configure Clara DICOM adapter destinations to push to as predetermined PACS from which you can retrieve the secondary dicoms using the DICOM viewer.
To clarify my questions above better
- This might more be a question on DL as well. I understand that models are pre-trained. So, the model that you have in trtserver right now is like kind of zipped file (I mean everything put together as a package). Am I right? If I would like to see the codes of these models where can I find them? So when we save the models in any of the format listed above, it just retains the configuration and weights. Is my understanding right?
3.3) a) Am I right to understand that dockerfile.client builds the c++ and python clients to send requests to the trtserver models?
b) I see that you have given the code for image_client (c++) and image_client.py (python) version. Am I right to understand these files shows the code on how this client was built ?
c) Let's say now I am putting in my segmentation model written in tf.keras in accepted format in trtserver. Now I would like to send requests to my model through client. Am I right to understand that the same clients can be used to send request to my model?
d) I guess there is no constraint w.r.t tf 2.0 or old tf. We are trying to follow this link to convert the model (https://www.tensorflow.org/beta/guide/saved_model). Do you think is there any other simple way to convert the model or any tutorial link that you can share where I can learn this
e) When we try to run liver segmentation worklfow through Orthanc, can I know which model is being called in trtserver? Is it inception_net or netdef?
You have been patient to answer my questions and your response is helping us figure out things.
In addition to the above questions, please find the issue that I encountered when I try to start the tensorrt server with our seg_model for skin lesion. It is a binary segmentation. Please find the config.pbtxt below
We got the model in required format by using the link (https://www.tensorflow.org/beta/guide/saved_model)
It’s a tensorflow implementation on keras API
The error that I encounter when I try to start tensorrt server is given below
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:307] Error parsing text-format nvidia.inferenceserver.ModelConfig: 8:5: Unknown enumeration value of "DT_FLOAT" for field "data_type".
E0618 10:06:23.625857 1 server.cc:296] Can't parse /models/seg_model/config.pbtxt as text proto
Hope this helps.
- We didn’t create these models, they are fetched via
fetch_models which pulls them from public repos. You can look at the URLs in that script and see if you can find the source code and the training source in the appropriate website.
3.3) a) Dockerfile.client builds the clients though it does not run them, you will need to start the container you built and fire up those clients. See https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-master-branch-guide/docs/client.html#build-using-dockerfile
b) Yes these are examples of C++ and Python-based clients
c) You would have to modify the example clients to fit your problem, but in general yes you can use them if they fit your exact specifications.
d) As long as you have a saved model (or any of the other ) it should work. Are you asking for examples of how to convert models? The easiest way to convert models lies within Clara Train SDK, where from the command line you can convert a model checkpoint to a graphdef using
e) This is a question that goes beyond TRT inference server; this depends on whether your are using Clara Deploy SDK or not. If you are using Clara Deploy SDK then you would have to configure an Application Entity Title (AET) and destination endpoint joined together by a workflow. Your containerized application in that workflow/pipeline decided which model to execute via the
ctx.run call. The model run is decided by that model name (identifier). The model you deploy
Alternatively, if you are using this outside of Clara Deploy SDK (meaning only using TRTIS with a custom client) then the same applies, just be sure you’re calling the correct model.
If you want to use the liver segmentation model, either use the one that comes with Clara Deploy SDK or use
tlt-pull to get the pre-trained liver segmentation model, use
tlt-export to export the model to a graph def, and put it in the TRTIS model repo along with the appropriate
config.pbtxt. That model will be available when you start TRTIS.
Thank you Alvin. Does the config.pbtxt file which I shared looks fine?
But it throws an error for data type? Even though I have a valid datatype
It throws an error when you send a request? Make sure that the type of data you’re sending matches it. It should be