Got a problem with "Performance" example from Intro to Clara Train SDK notebook


I executed codes example from “Performance” jupyter notebook ( ) I have encountered exception errors when execute this cell:
! $MMAR_ROOT/commands/ trn1_BT.json 0
The errors are showing about out of resources

Resource exhausted: OOM when allocating tensor

I understand that a model with data is too big to fit in the memory of GPU or physical memory.
I use Tesla V100 with 16GB of memory and 64Gb of physical memory to run the above example.

Please kindly suggest me the configuration that I could run this example in the notebook.

Thank you.

You should be able to decrease the memory used by reducing the batch size from 4 to 2 or even 1. You could also change the output shape from [128,128,128] to [64,64,64]. Please note the shape change need to be changed in multiple locations in the train config

    "image_pipeline": {
      "name": "SegmentationImagePipeline",
      "args": {
        "data_list_file_path": "{DATASET_JSON}",
        "data_file_base_dir": "{DATA_ROOT}",
        "data_list_key": "training",
        "output_crop_size": [128, 128, 128],                       <-----
        "output_batch_size": 4,                                            <---- 
        "batched_by_transforms": true,
        "num_workers": 4,
        "prefetch_size": 8