Clara Train - Split input volume along one axis

I have a bunch of volumes that I need to split along the z-axis for my segmentation model. Is it possible to create multiple 5-slice subvolumes from each input volume as part of the pipeline, or do I need to create a separate file for every training example?
As an extra wrinkle, the volumes have the same X and Y dimensions, but the Z-dimension is variable (one of the reasons I’m working with subvolumes).

Thanks!

Hi
Thanks for your interest in clara train sdk.
To crop your volumes you can use any of the cropping transforms provided by the sdk as shown here
you can use:

  • CropFixedSizeRandomCenter to crop randomly
  • CropByPosNegRatio to crop around forground and background
  • CropSubVolumeCenter to crop around the center

there are also other special cropping transformations.
in a more specific case you can simply write your own cropping logic and add it as BYO transformations as shown in this notebook clara-train-examples/BYOC.ipynb at master · NVIDIA/clara-train-examples · GitHub

Hope that helps

Doesn’t that require it to re-read the entire volume each time that it uses any crop? Most of my volumes are over 100 slices, so it would be at least 20 separate reads if the model was going to train on each example once.
One potential workaround is to just split the volume file into multiple files to take that preprocessing step completely off of the pipeline, but I was hoping to minimize the number of extra files and file reads I need to have.

Hi
You are correct. However, the main goal of the sdk it to optimize and train as fast a possible while being very flexible. As a data scientist you should not worry about speed the sdk will take care of it for you. You should also not do things off line to resample adjust contrast etc as this limits your ability to find the best parameter

In order to have the sdk do that you should use smart cache pipeline as explained here clara-train-examples/Performance.ipynb at master · NVIDIA/clara-train-examples · GitHub

To should look at the parameter “batches_to_gen_at_once” in the FastCropByPosNegRatio transform which generates the mid pixel location to crop around (minimal size ) and since the data is cached in mem then the crop happens instantly on demand. of course this utilizes number of workers doing prefetch so the gpus stay utilized

In summary please talk a look at the performance notebooks, you can also go through the gtc talk explaining it at

  • S22563 Clara train Getting started: cover basics, BYOC, AIAA, AutoML
  • S22717 Clara train Performance: Different aspects of acceleration in train V3

Okay, it sounds like I was overthinking things. I think the performance notebook you shared is exactly what I was looking for, I’ll just have to crop to slices.

Thanks for the help!