Spleen Demo: ScaleByResolution and Crop

In the spleen demo’s pre_transforms in trn_base.json,


  • It looks like this transform effectively normalizes the coordinate space of the image.
  • So I’d guess that this makes the voxel coordinate space [(0.0, 1.0),(0.0,1.0),(0.0,1.0)]
  "name": "ScaleByResolution#ScaleImg",
  "args": {
    "fields": ["image"],
    "target_resolution": [1.0,1.0,1.0]


From the documentation, it looks like this crops the image and label to a box [64,64,64] in dimensions, but I’m a bit confused about the coordinate space involved in this.

  • Didn’t we just normalize the image? Wouldn’t 64.0 be far outside the bounds of the image ((1.0, 1.0, 1.0])?
  • What is the 3d point of origin of this crop? (e.g., the center of the rectangular prism region)
  • Does this transform always crop from (0,0,0)? (meaning, just crop the image in the range [(0, 64), (0,64), (0,64)]?)
  "name": "FastPosNegRatioCropROI",
  "args": {
    "size": [64, 64, 64],
    "image_field": "image",
    "label_field": "label",
    "deform": false,
    "rotation": false, "rotation_degree": 20,
    "scale": false,"scale_factor": 0.1,
    "pos": 1,
    "neg": 1,
    "fast_crop": true


It would be nice to have a few more details on the meaning of these parameters

  • output_crop_size: Why are we cropping again?
    This same setting is in the validation too but it doesn’t seem like the inferred images are [64,64,64] in dimensions.
  • output_batch_size: What is this? The documentation doesn’t say much. It sounds like it has something to do with Smart Cache.
  • batched_by_transforms: What does it mean to “batch” something / what is being batched? Does a transform do the batching, or is it something like “group batches by transforms”?
  • num_workers: This is mostly just a curiosity, but since most of this happens on the GPU, what are the worker “data transformation” threads used for? Do some transforms execute on the CPU? How do I know what to set this to?
  • prefetch_size: What is being prefetched? What does it mean to prefetch something? Is this loading images into the GPU’s memory? I’d guess I should set this to some number N, where <Image Size> * N ~= <amount of GPU memory I have>, is that correct?
"image_pipeline": {
  "name": "SegmentationImagePipeline",
  "args": {
    "data_list_file_path": "{DATASET_JSON}",
    "data_file_base_dir": "{DATA_ROOT}",
    "data_list_key": "training",
    "output_crop_size": [64, 64, 64],
    "output_batch_size": 4,
    "batched_by_transforms": false,
    "num_workers": 4,
    "prefetch_size": 8

Also one more semi-related thing, is the Python source available for some of these transforms? It would be interesting to review it and I think it would help with debugging.

Thanks for your interest in clara train. I think some concepts are not clear and is making some confusion, sorry about that we are working on improving our documentation.
I hope you are following notebooks from https://github.com/NVIDIA/clara-train-examples/tree/master/NoteBooks

To your questions

  • scale by resolution change the resolution on the image, it is not normalizing. so in the example it resamples to 1x1x1 mm^3. This is unrelatd to the max dim in each dimension.
  • FastPosNegRatioCropROI will crop 64x64x64 pixels around the foreground sampled point and another for background according to the pos, neg ratio
  • SegmentationImagePipeline is specifying the pipeline so the crop is a bad name I agree with you. it means the input size to the model, for batch and batch by transform and other parameters please refer to this notebook
  • Clara train V3.1 is closed source, we have listened to this feedback and therefore we are currently moving the back end to use MONAI which is open source

Hope that helps.

1 Like

Hi “aharouni”,
thanks for taking note of the feedbacks; but also i might humbly request for not just notebooks; but direct examples. i feel that for serious pipeline, somehow notebooks just dont make the cut(could be my personal opinion). Of course, one can always strip out the notebook parts.

1 Like

Thanks for your input. The notebooks are provided simply as an easy way for user to run examples and see results. We also can explain concepts as a self guided workshop manner as in clara-train-examples/Performance.ipynb at master · NVIDIA/clara-train-examples · GitHub
Also to show advanced features as AutoML as in clara-train-examples/AutoML.ipynb at master · NVIDIA/clara-train-examples · GitHub

if you already know enough about clara train sdk then all you need is the train json files that are in the config folder of the mmar as in this folder clara-train-examples/NoteBooks/GettingStarted/config at master · NVIDIA/clara-train-examples · GitHub

Thanks, that helps,

I hope you are following notebooks from clara-train-examples/NoteBooks at master · NVIDIA/clara-train-examples · GitHub

I ran through one of the clara-train-examples repository notebooks, but not the two you mentioned, oops.

It looks like the notebook uses a screenshot of a slide for “batch by transform”, maybe that explains why grep and Google didn’t find this.

What slide deck is this from, can I download the PDF of that? It’s slide 11 of… something.

To quote the screenshot (converted to MD):


Copy data to memory, crop, discard

Batch by transform

  • Take multiple crops from same data volume as your batch
  • Sets Batched_bt_transform=true, ignores output_batch_size
  • Must have one of the batching transformations:
    • CropByPosNegRatio
    • CropByPosNegRatioLabelOnly
    • FastCropbyPosNegRatio
  • Use batches_to_gen_at_once
  • Just to make sure I understand, it looks like FastPosNegRatioCropROI does not crop the entire 3d image to [64,64,64].
  • instead it just changes how the data is processed. I.e., “process the image as a set of 64x64x64 chunks”.
    • Are these overlapping chunks? If so, what is the overlap “increment”? 1x1x1 voxel?

EDIT: Actually this slide seems to mention loading to memory too? Do Smart Cache and Batch by Transform both concern themselves with loading data from disk?

Whereas for Smart Cache, it sounds like it

  • Stores the result of deterministically transformed data
    • In the GPU’s memory? Is this an abstraction over the GPU’s limited memory?
    • Or is this in the PC’s RAM, and this is a sort of “paging”-like setup where the API shifts data from PC ram to the GPU’s memory (the “cache”)

Scale by resolution change the resolution on the image, it is not normalizing. so in the example it resamples to 1x1x1 mm^3. This is unrelatd to the max dim in each dimension.

So to make sure I understand this, if I had an image with voxels of size 0.5x0.5x0.5 this would do some kind of nearest neighbor interpolation, doubling the size of each voxel, thus possibly reducing the resolution of the image?

What if the input image pixels are larger than 1 mm^3 in size? Will it interpolate in the other direction too?

thanks for taking note of the feedbacks; but also i might humbly request for not just notebooks; but direct examples. i feel that for serious pipeline, somehow notebooks just dont make the cut(could be my personal opinion). Of course, one can always strip out the notebook parts.

I am inclined to agree with v.srikrishnan

  • It’s great to have slide decks but they aren’t particularly searchable.
  • It would be nice if the SDK docs were a more comprehensive description of the transforms, or if they could link to supporting materials as needed.
  • Personally, I would have preferred a Markdown / docs tutorial instead of Jupyter notebook, I am okay with copy/pasting commands into the terminal, and I prefer that (it’s convenient to have a terminal open to examine the state of the system).
    • I think Jupyter is best for cases where the examples are self-contained, since Clara hits the filesystem and writes to env vars I found it a bit harder to follow.
    • But to be fair I am aware that Jupyter is popular these days, and maybe I am biased / used to MD because I am a software developer, it’s not a big deal.

The performance note book should be explaining the acceleration built inside clara clara-train-examples/Performance.ipynb at master · NVIDIA/clara-train-examples · GitHub
as shown on the resources section there are 2 gtc talks explaining this in details

you could find more talks about clara at Search | NVIDIA On-Demand

We value all feed back so please let us know what is missing / not clear in the notebooks so we have improve / fix. Our goal is to have users train their first model in < 1 hour

To your questions:

  • Batchby transform doesn’t do scanning window. instead it randomly finds a pixel that is foreground and crops the specific size then does the same for a background depending on the ratio you set. there is no scanning window or overlapping concept while training. the scanning happens in the inference and if you choice so it can happen on validation

  • smart cache uses system mem to cache all deterministic transforms as opening nifti, changing resolution adjusting contrast etc

  • Yes scale by resolution resample both ways to have a fixed resolution. You need this since human organs are almost the same size. You can then do random scaling/ zooming

In regards to notebooks the goal is to have an easy setup to get users to train and get started in the minimal time and effort. Once you are comfortable with the sdk all you really need is the train_config.json to train. Also FYI the jupyter lab can give you access to a terminal to run the train.sh

1 Like

Thanks, that helps, I think I’m getting close here.

Can you give me more details on what you mean by foreground and background pixels?

I am guessing that a “foreground pixel” is a pixel that is in the labeled area, and a “background pixel” is a pixel that isn’t in the labeled area, is that what you mean?

EDIT: Also

  • Is the FastPosNegRatioCropROI transform executed once, producing a set of cropped images off of a single large image?
  • Or does it produce only one random cropped image every time it’s ran and it overwrites “image” with that cropped part?

Or is the flow of image_pipeline something like:

for imageFile in data_list_key:
    pre_transforms_results = (run pre_transforms)
    for croppedImage in pre_trasnforms_results:
        (run through the model trainer? (where is that?))

It looks like the model trainer is UNet but it’s not super clear to me how the cropped images get to the model training part of the setup.

It looks like it’s hard coded to pass whatever “image” and “label” are directly to the model (and after the crop, “image” and “label” would be [64,64,64] sized images), is that correct?

I strongly advice you watch the GTC talks as I think they answer some of your questions

That is correct

It is executed multiple times to find the center pixel according to the batches_to_gen_at_once parameter. In the Example below sdk will find 15 centers at once with ration 2:1 so 10 for forground and 5 for background. Then it would be used 3 at a time since the batch is set to 3. Other parameters in the

things can influence this are the imaging pipeline arguments

        "name": "FastCropByPosNegRatio",
        "path": "myTransformation9.myFastCropByPosNegRatio",
        "args": {
          "size": [192, 192, 48],
          "fields": "image",
          "label_field": "label",
          "pos": 2,
          "neg": 1,
          "batches_to_gen_at_once": 15,
          "batch_size": 3

Correct, Transforms run sequentially one after the other then passed to the training loop

it is not quite right there is more to it

Hope that helps

1 Like