Can I deploy multi-segmentation model on Deploy SDK?

Hello there,

Now I’m trying to deploy own model which has multi-labels (5 label).

I could create pipeline, model, and job with reference to Lung-Tumor Segmentation pipeline.
However, it has not work without error.

Are there my mistake for the description of “config.pbtxt” ?

In the first place, can I deploy segmentation model which has multi labels on Deploy SDK?

Thank you.

@yasu18, thanks again for your question.

Please clarify in your context what the “multi labels” are, i.e. classification classes (most likely) or segments (inferred by your use of segmentation pipeline, but it might not be the right pipeline). Of course, the DL model and output would be very different between segmentation and classification.

Clara Deploy make use of Triton Inference Server for remote inference on the provided and loaded models, and as you have found out, the config.pbtxt file for each is slightly different, mostly in the output dim and datatype. For the Triton API that we use, there are two separate client inferers in the Clara Deploy, one for segmentation, with the name of sliding window inferer, and the other for classification, with the name simple inferer.

In the Clara Deploy reference pipelines, there are classification examples, e.g. COVID-19 classification as part of the COVID-19 pipeline (in addition to two segmentation models). There is also a simple standalone classification example, e.g. chest Xray, and its config.pbtxt is like the following (please note the semantical meaning of the label values are provided in a labels text file for Triton inference server),

name: "classification_chestxray_v1"
platform: "tensorflow_graphdef"
max_batch_size: 32
input [
  {
    name: "NV_MODEL_INPUT"
    data_type: TYPE_FP32
    format: FORMAT_NHWC
    dims: [256, 256, 3]
  }
]
output [
  {
    name: "NV_MODEL_OUTPUT"
    data_type: TYPE_FP32
    dims: [15]
    label_filename: "chestxray_labels.txt"
  }
]
instance_group [
  {
    kind: KIND_GPU
    count: 2
  }
]

Please also see below the config.pbtxt file for the COVID-19 Lesion model (two-segment segmentation model, where the output dim matches the input, while having two channels, one for lung and one for lesion),

ame: "segmentation_ct_covid_lesion_v1"
platform: "tensorflow_graphdef"
max_batch_size: 32
input [
  {
    name: "NV_MODEL_INPUT"
    data_type: TYPE_FP32
    dims: [ 1, 224, 224, 32]
  }
]
output [
  {
    name: "NV_MODEL_OUTPUT"
    data_type: TYPE_FP32
    dims: [ 2, 224, 224, 32]
  }
]
instance_group [
  {
    kind: KIND_GPU
    count: 2
  }
]

Please take a look, and let us know if there are more questions. We’ll respond ASAP and work with you to achieve success.

Best Regards,
Ming

@Ming_Q ,

Thank you for your polite response.

“multi labels” meant “segments” not “classification classes”, and image type is “Chest X-ray” not “CT”.
So I would like to run “Chest X-ray’s multi label semantic segmentation”.

I have described config.pbtxt like COVID-19 segmentation example (the 2nd example you showed me).

The dims described in my config.pbtxt which I posted in here first is equal to the shape of my model input/output when I use the model locally with python script.

But now, I got error such below:

RuntimeWarning: divide by zero encountered in true_divide
np.asarray(output_shape, dtype=float))
ERROR:app.App:Log and propogate exception: cannot convert float infinity to integer

Should I have to edit other files?

— additional info —
Input data-format: DICOM

Thank you.

@yasu18 Thank you for the confirmation.

I have been trying to infer from possible root cause of the error based on you model config and the error message, but I cannot be certain, even though have a couple areas to look into further.

  1. Based on your model config pbtxt file, it is channel last, and the tensor shape is DHWC (batch size N = 1 and is not in the dims per pbtxt file schema). In your case, depth D is 1 for X-ray, and input channel is 1 while the output channel is 5. I am suspecting the code somehow is confused by the shape. Is it possible to make it channel first, CHWD?
  2. For this being segmentation, the TRTISScanWindowInferer needs to be used.

Thanks

@Ming_Q ,

Thank you for your reply.

As you know, my model needs data which shape is DHWC. So I edited “config_inference.json”.
I changed “name”: “ConvertToChannelsFirst” in “pre_transforms” item to “ConvertToChannelsLast”.
Then my model could get Input data and output result, but there are some problems like below:

And I found Failed in the chestxray-segmentation’s log. I don’t know why this problem occurred.

Thank you.

Sorry, the changing “ConverToChannelFirst” to “ConvertToChannelLast” I described in above posting was not relate to the error.
I got same error in either.

Does the way of deploying the Channel Last’s model nothing?

What should I do for below WARNING?

Unsupported or empty metaData item ITK_FileNotes of type Ssfound, won’t be written to image file

@yasu18 Sorry for the late response. That warning can be ignored.
The channel last in the model likely caused the error. The model config.pbtxt file supports an attribute, format, though it may not interpreted by the inference operator in the pipeline (which retrieves the model config from Triton server).
Is it possible to change the model to be channel first? If yes, then in the inference config, the ConvertToChannelsFirst transform will be needed.
Thanks

@Ming_Q

Thank you for your reply.
As you said, I created channel-first model and run the job with it.
However, it couldn’t run without problems.
The error is like below:
[dicom-writer]

image_orientation = (direction[0], direction[3], direction[6],
IndexError: tuple index out of range

Though the model has 5 classes (0 is background), is my description in “config_inference.json” correct ?
Please confirm below:

{
“name”: “SplitBasedOnLabel”,
“args”: {
“field”: “model”,
“label_names”: [
“pred_class0”,
“pred_class1”,
“pred_class2”,
“pred_class3”,
“pred_class4”
]
}
},
{
“name”: “CopyProperties”,
“args”: {
“fields”: [“pred_class0”, “pred_class1”, “pred_class2”, “pred_class3”, “pred_class4”, “model”],

Thank you.

Now I cannot find tensorflow’s pretrained model on NGC CATALOG.
I found a pytorch’s multi segmentation model. ( clara_pt_liver_and_tumor_ct_segmentation)

Though the things that I want to do is Chest X-ray’s segmentation, can I refer config_inference.json of pytorch’s model?

Please let me know if there is a difference in the description method between tensorflow’s model and pytorch’s one.

@yasu18 A couple things,

  • For the multi-label segmentation result in the deploy pipeline, there is no need to split the labels from the inference results. The image written by the inference operator would then has voxel values of 0 and the label integers. This would be fine, because in any case, the result will be written into one DICOM instance. So, SplitBasedOnLabel can be completely removed.
  • The “CopyProperties” section can then be simplified, to something like below,
{
            "name": "CopyProperties",
            "args": {
              "fields": ["model"],
              "from_field": "image",
              "properties": ["affine"]
            }
        }

The the current Pytorch models available on NGC, it requires a bit of explanation.

  • The Tensorflow models on NGC were trained with Clara Train v3.x whereas the Pytorch models are from Clara Train v4.x.
  • Clara Train v3.x uses close source transform libraries created by NVIDIA Clara Train team, whereas the Clara Train v4.x uses the open source MONAI libraries, to which Clara Train team contributed large portion of the design and code. So, you can see the similarities in the inference config by the way of referring to the dictionary based transforms, but the underlying libraries are different.
  • Clara Deploy only supports Clara Train 3.x, as of now. This means that only the models and their inference config from Clara Train 3.x can be used in Clara Deploy.

A new version of Clara Deploy inference operator that supports Clara Train V4.x had been completed, but might not get released.

Since NVIDIA is a member of Project MONAI, some of the AI deployment work is being contributed to the MONAI GitHub repo.

@Ming_Q

Thank you for your prompt and kind reply.

I removed “SplitBasedOnLabel” from my config_inference.json and modified “CopyProperties”.
So now, my config_inference was described below contents:

INFO:app.App:Config details: {‘name’: ‘LoadNifti’, ‘args’: {‘fields’: ‘image’}}
INFO:app.App:Config details: {‘name’: ‘ScaleToShape’, ‘args’: {‘fields’: ‘image’, ‘target_shape’: [1, 256, 256]}}
INFO:app.App:Config details: {‘name’: ‘ConvertToChannelsFirst’, ‘args’: {‘fields’: ‘image’}}
INFO:app.App:Config details: {‘name’: ‘ScaleBySpacing’, ‘args’: {‘fields’: ‘image’, ‘target_spacing’: [0.143, 0.143, 1.0]}}
INFO:app.App:Config details: {‘name’: ‘ScaleIntensityRange’, ‘args’: {‘fields’: ‘image’, ‘a_min’: 1600, ‘a_max’: 3000, ‘b_min’: 0.0, ‘b_max’: 1.0, ‘clip’: True}}
INFO:app.App:Config details: {‘name’: ‘ArgmaxAcrossChannels’, ‘args’: {‘fields’: ‘model’}}
INFO:app.App:Config details: {‘name’: ‘RestoreOriginalShape’, ‘args’: {‘field’: ‘model’, ‘src_field’: ‘image’, ‘is_label’: True}}
INFO:app.App:Config details: {‘name’: ‘CopyProperties’, ‘args’: {‘fields’: [‘model’], ‘from_field’: ‘image’, ‘properties’: [‘affine’]}}

After run with above composition, my app got “green check mark” like this.

But I couldn’t get result (mask image) from download link. All of the result image’s pixel values I got from the link was zero…

By the way, other app’s status were like this.

I understood about below things:

  • The reason why I couldn’t find tensorflow’s pretrained model on NGC.

  • I can refer config_inference.json from “clara_pt_*” but just on the similar point.

  • The underlying libraries are different.

After solved my problem, I am going to check MONAI GitHub repo.

Thank you.

From the pipeline job status screenshot, I cannot see the inference operator, the core piece the segmentation pipeline, in-between dicom-reader and dicom-writer.
It would be helpful that you attached the whole inference config JSON file, as well as the pipeline YAML file.
Thanks.

ctrxray-pipeline.yaml (4.1 KB)
config_inference.json (2.1 KB)

@Ming_Q

Sorry.
I attached pipeline-yaml and config-json file.
Please confirm them.

Thank you.

@yasu18 Thanks for the config files.
The config_inference looks fine. One nit is that both the ScaleToShape and ScaleBySpacing with the latter actually overriding the former. So, the ScaleToShape can be removed while only keeping ScaleBySpacing (this is assuming that the model has been trained with images with pixel spacing matching the target spacing; it is good also that you have given spacing of 1 for the 3rd dim)

Looking again at the pipeline job status, I don’t see the expected status of the inference operator. Without seeing the log of this particular operator (docker), I can only speak generally what could have gone wrong in its execution,

  • In the pipeline YAML, the requested system mem for this docker is 4GiB, which looks reasonable given that this is only a 2D(3D with single slice) image. But please ensure the system mem usage is indeed below this value, otherwise the docker would be terminated due to out of mem (available to the docker).
  • There could be some failure in the inference operator, in the pre and post transforms, inference, or even loading the input file. Even though the Dashboard does not show this operator’s log, the logs for this job can be retrieve on the server, using Kubernates command, e.g. kubectl logs <job pod> <container>. You can also view the output from each operator/docker in the pipeline job; with clara cli, list the jobs and the payload id, then on the server locate the clara root, then find the folder named after the payload id, then find the folder for each operator.

Thank you.

1 Like

@Ming_Q ,

Thank you for your confirmation and reply.

My input data is Chest-Xray image and the size is different for each data, so I removed “ScaleBySpacing” not “ScaleToShape”.

There was no problem about memory usage.

And I attached text file which is described logs of “my segmentation app” and “dicom-writer”.
I got them with “kubectl logs” command.

Could you please confirm them?
When I confirmed the logs, I have thought that my model might has some problems.

Thank you.
dicomwriter_logs.txt (5.3 KB)
segmentation_app_logs.txt (11.3 KB)

@yasu18 Thanks extracting and sending over the log files, which does help with debugging the issue.

As you suspected, and confirmed by the inference operator log, the inference failed, and the most likely culprit is order of the dims, as seen in the following section of the inference operator log,

INFO:TRTISScanWindowInferer:Scan Num: [1, 1, 1]
INFO:TRTISScanWindowInferer:Scan Interval: [1, 256, 256]
INFO:TRTISScanWindowInferer:Image Size: [1, 256, 256]
INFO:TRTISScanWindowInferer:Number of slices: 1
INFO:TRTISScanWindowInferer:++ Before scan run time is 0.00011724792420864105
INFO:TRTISScanWindowInferer:Output Key: model => NumClasses: 1 => Shape: (1, 1, 1, 256, 256)
INFO:TRTISScanWindowInferer:Number of slices: 1

The input image is either NIfTI file to the transforms pipeline, and after loading these files into numpy array by the LoadNifti, the convention is that the axial dim is the last one. However, in the log, once the tensors gets to the actual inference stage, the first dim seems the axial dim. This could well be cause by the use of ScaleToShape which treat the first dim as the axial.

From my understanding and all the models that I have done, ScaleBySpacing is always used instead of ScaleToShape, because it is the image property that can be made consistent for input of any shape. Even though CT and MR tend to have standard per instance dims of 512x512, or 1-24x1024, XRay image may have different pixel dims, and force them to 256x256 will likely make spacing non-deterministic. In any case, this is would only affect the inference quality, but reversing of the axial dim that caused issues.

Just to share a bit of my experience deal with a few medical imaging packages. PyDicom serve the pixel data with axial dim first with row first, ITK however, provide a nice API to convert raw DICOM voxels to numpy array with the correct shape, i.e. column first and row, with axial be the last. In Clara Dicom Reader, Simple ITK is used to convert DICOM pixel to MHD, or if you choose to use the newer and more advanced DICOM Parser, Mhd or NIfTI or PNG (if XRay), so axial dim would be the last.

Another thing I found that not matching your expectation is the output shape, (1,1,1,256,256), which should be (1, 5, 256, 256, 1) for 5 channels and after the spatial dim is corrected.

Hope this helps.

1 Like

@Ming_Q ,

Thank you for your reply.

I understood about dims and “ScaleBySpacing”, and thank you for sharing your experience.
Then I changed to use ScaleBySpacing instead of ScaleToShape, I got an error in my segmentation app like below:

ZeroDivisionError: float division by zero

Also, I got an same error in case I used ScaleToShape with [256, 256, 1].

I attached files described all logs including above error.
Could you please confirm it?
I’m sorry for the inconvenience.
segmentation_app_logs.txt (10.0 KB)

Thank you.