TLT train maskrcnn model with Mapillary Vistas Dataset failed on CUDA_ERROR_OUT_OF_MEMORY: out of memory

Sorry, my mistake. The num_shards didn’t help me with my 1000 images training. It was the training_file_pattern in spec I set to val*.tfrecord (which generated from 500 val images with num_shards 256) made the training succeeded.

Here’s the 1/8 resized 1000 images:
https://drive.google.com/file/d/1ymqOKKFN3u8qmHTYlqyBumIAVqXOeMck/view?usp=sharing

Json:
https://drive.google.com/file/d/16WpE_Pi0M_dnPtp_UmDvMfR4-l3fNtZH/view?usp=sharing

Generated tfrecords:
https://drive.google.com/file/d/1ocz7NADPwkXQPaAqECCcirv8OLFvQ8x2/view?usp=sharing