Error when executing CPU operator RandomBBoxCrop encountered

Dear Team,

I am trying to use custom data to train resnet50 model using pytorch framework. I have maintained the proper format of dataset but having issue while I start training.

I am refering to following github repo.

and I am Using NGC pytorch container :
nvcr.io/nvidia/pytorch:23.11-py3

Please suggest. I am attaching the error logfile below :

admin1@rptech-server:~$ sudo docker exec -it pytorch /bin/bash
root@72026e95d53f:/workspace# cd /workspace/application_lab/pytorch_ssd_resnet_50/DeepLearningExamples/PyTorch/Detection/SSD
root@72026e95d53f:/workspace/application_lab/pytorch_ssd_resnet_50/DeepLearningExamples/PyTorch/Detection/SSD# python main.py --backbone resnet50 --mode training --epochs 65 --ebs 4 --data /workspace/application_lab/pytorch_ssd_resnet/DeepLearningExamples/PyTorch/Detection/SSD/data/ --save /workspace/application_lab/pytorch_ssd_resnet/DeepLearningExamples/PyTorch/Detection/SSD/savemodel/
NOTE! Installing ujson may make loading annotations faster.
DLL 2024-03-13 05:59:11.745286 - PARAMETER dataset path : /workspace/application_lab/pytorch_ssd_resnet/DeepLearningExamples/PyTorch/Detection/SSD/data/ epochs : 65 batch size : 32 eval batch size : 4 no cuda : False seed : None checkpoint path : None mode : training eval on epochs : [21, 31, 37, 42, 48, 53, 59, 64] lr decay epochs : [43, 54] learning rate : 0.0026 momentum : 0.9 weight decay : 0.0005 lr warmup : None backbone : resnet50 backbone path : None num workers : 8 AMP : True precision : amp
Using seed = 7429
Loading annotations into memory…
Done (t=0.00s)
Creating index…
Done (t=0.00s)
Traceback (most recent call last):
File “/workspace/application_lab/pytorch_ssd_resnet_50/DeepLearningExamples/PyTorch/Detection/SSD/main.py”, line 304, in
train(train_loop_func, logger, args)
File “/workspace/application_lab/pytorch_ssd_resnet_50/DeepLearningExamples/PyTorch/Detection/SSD/main.py”, line 161, in train
train_loader = get_train_loader(args, args.seed - 2**31)
File “/workspace/application_lab/pytorch_ssd_resnet_50/DeepLearningExamples/PyTorch/Detection/SSD/ssd/data.py”, line 41, in get_train_loader
test_run = train_pipe.schedule_run(), train_pipe.share_outputs(), train_pipe.release_outputs()
File “/usr/local/lib/python3.10/dist-packages/nvidia/dali/pipeline.py”, line 1036, in share_outputs
return self._pipe.ShareOutputs()
RuntimeError: Critical error in pipeline:
Error when executing CPU operator RandomBBoxCrop encountered:
Error in thread 0: [/opt/dali/dali/pipeline/util/bounding_box_utils.h:164] Assert on “limits.contains(boxes[i])” failed: box {(-0.00333333, 0.356667), (0.403333, 0.406667)} is out of bounds {(0, 0), (1, 1)}
Stacktrace (7 entries):
[frame 0]: /usr/local/lib/python3.10/dist-packages/nvidia/dali/libdali_operators.so(+0x69e80e) [0x7bdb402f880e]
[frame 1]: /usr/local/lib/python3.10/dist-packages/nvidia/dali/libdali_operators.so(+0x1667d0f) [0x7bdb412c1d0f]
[frame 2]: /usr/local/lib/python3.10/dist-packages/nvidia/dali/libdali_operators.so(+0x166c03d) [0x7bdb412c603d]
[frame 3]: /usr/local/lib/python3.10/dist-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool, std::string const&)+0x1e6) [0x7bdc0e335c36]
[frame 4]: /usr/local/lib/python3.10/dist-packages/nvidia/dali/libdali.so(+0x7b34f0) [0x7bdc0e9074f0]
[frame 5]: /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7bdd82f65ac3]
[frame 6]: /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44) [0x7bdd82ff6a04]

Current pipeline object is no longer valid.

Hi team,

Waiting for your reply.