Failed fetch ngc model - "Failed to export model to TRTIS"

Hi, I’m trying to run the clara-train notebook. The notebook is running inside its docker on a Ubuntu server. The two default ports are used and work (clara_train portal and AIAA through python-client). When I’m trying to fetch a model, download and so on are running fine until I get this error:


Full log on “…/logs/?lines=100”

Hi
Thanks for your interested in Clara AIAA and trying the notebooks.

The command you are run it connecting to NGC, downloading the model then uploading it to the AIAA server. It is giving an error which could be caused by a couple of things:

  • not able to connect to NGC. Are you able to run the cell to list the models?

!ngc registry model list nvidia/med/clara_*

If you get error then you are just need to login to NGC from the cmd line
If this still give you trouble you can manually download the model to your disk from the web interface deepgrow (click on the … button on the top right then download)
once you have the zip file then you could run

dataArg="data=@"+AIAA_ROOT+"/deepgrow_nifti.zip"
!curl -X PUT "http://127.0.0.1/admin/model/clara_deepgrow_v2" -F $dataArg

if you extract it and only have the model and the json file then you should run the command below to load the model

curl -X PUT "http://127.0.0.1/admin/model/modelNameInAIAA" \
     -F "config=@clara_ct_seg_spleen_amp_v1/config/config_aiaa.json;type=application/json" \
     -F "data=@clara_ct_seg_spleen_amp_v1/models/model.trt.pb"

Hope that helps

Hi Aharouni,
Actually I am also getting the same issue.
After running this command !ngc registry model list nvidia/med/clara_*
I am getting the list of models. There I dont have issue.

When I am running the above curl command the model is also downloaded in the temp folder but after that it is not able to load to TRTIS.
So there is a bug in loading the downloaded madel on TRTIS server.

{“error”:{“message”:[“5”,“Failed to export model to TRTIS”],“type”:“AIAAException”},“success”:false}

  1. After you start AIAA server,
    Please try “nvidia-smi” to see if there is a process called “trtserver” there.

    1a. If nothing shows up that means your GPU is not compatible to run in TRTIS backend.
    Please start your AIAA server using --engine AIAA to use pure TF.
    In that case the deepgrow model on NGC will not work since that is in TorchScript for TRTIS.

    1b. If the process is there please go to step2.

  2. Use curl command to upload a model (we can try clara_ct_seg_spleen_amp)
    2a. If no error shows up then you are good to go.
    2b. If there is error, try “nvidia-smi” to see if trtserver is still there. If it is not, it means your GPU does not have enough memory to load this model, so the process get killed by OS.
    2c. If there is error, and you check “nvidia-smi” and trtserver is in the processes => Please try start AIAA server with larger TRTIS model time out by passing the flag “–trtis_model_timeout 120”

  3. If after you change the trtis_model_timeout and you reach 2c again. Please post your GPU and driver information along with your CPU and system memory. So we can try to reproduce the issue.