Building own model Clara Train SDK: only event files, no checkpoint files are created

Hi there,

I am building my own model according to https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v2.0/ with a self-made data loader+transformer and model.
When I run train.sh, only event files are created in the folder “models” but no checkpoint files.

The error message in the console is

Exception: <class ‘ValueError’>: Cannot feed value of shape (1, 384, 384) for Tensor ‘NV_MODEL_INPUT:0’, which has shape ‘(?, 1, 384, 384, 1)’

  • My input data are numpy arrays shaped (384, 384). Thus, in the config_train.json I used shape “HW”.
    I checked the shape after import data = np.load(file_name, allow_pickle=True).astype(self._dtype) where type is ‘f4’ and it shows (384,384).
  • To get (1, 384, 384), I used output_batch_size=1 in the image pipeline.

Do you have any advice which step I am missing? Data format should be ‘NCDHW’ (3D) or ‘NCHW’ (2D)

At the moment, I solved my issue by adding a dimension during data import with
data = np.expand_dims(data, axis=-1)

Please let me know if the dimension has to be added in a different way by using aimed packages. Thanks!

Hi

Thanks for your interest in clara train, glad you figured it out one way

It seems your input is 2d? or you are missing a dim. If you can add more details to the problem (model arch, input size)
Another question do you channelfirst transformation in the preTransforms

Thank you, in the preTransforms I added now a dimension and a placeholder for ?.

I realized that I don’t have access to a GPU according to https://devtalk.nvidia.com/default/topic/1038461/container-nvcaffe/error-detected-nvidia-tesla-k80-gpu-which-is-not-supported-by-this-container/, therefore changed the code to channels_last to make it run on CPU.

I solved all issues. Thank you for your support