Please provide the following information when requesting support.
• Network Type (PointPillars)
• TLT Version (4.0.1)
• Training spec file (pointpillars.yaml from getting_started_v4.0.1)
• How to reproduce the issue?
Input from Jupyter Notebook(getting_started_v4.0.1) to Run TAO Training:
!tao pointpillars train -e $SPECS_DIR/pointpillars.yaml
-r $USER_EXPERIMENT_DIR
-k $KEY
Error messages:
2023-09-27 06:55:41,108 [INFO] root: Registry: [‘nvcr.io’]
2023-09-27 06:55:41,184 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
2023-09-27 06:55:41,245 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/avresearch/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
python /opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/train.py --cfg_file /home/avresearch/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/pointpillars/specs/pointpillars.yaml --output_dir /home/avresearch/tao-experiments-2/pointpillars --key tlt_encode
INFO: Start logging
INFO: CUDA_VISIBLE_DEVICES=ALL
INFO: Database filter by min points Car: 553 => 25
INFO: Database filter by min points Pedestrian: 82 => 0
INFO: Database filter by min points Cyclist: 36 => 2
INFO: Loading point cloud dataset
INFO: Total samples for point cloud dataset: 159
/opt/conda/lib/python3.8/site-packages/torch/functional.py:484: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2984.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
INFO: Start training
epochs: 0%| | 0/80 [00:00<?, ?it/s]
epochs: 0%| | 0/80 [00:00<?, ?it/s]
Traceback (most recent call last):
File “</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/train.py>”, line 3, in
File “”, line 152, in
File “”, line 127, in main
File “”, line 93, in train_model
File “”, line 24, in train_one_epoch
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 681, in next
data = self._next_data()
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1374, in _next_data
return self._process_data(data)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1400, in _process_data
data.reraise()
File “/opt/conda/lib/python3.8/site-packages/torch/_utils.py”, line 543, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py”, line 302, in _worker_loop
data = fetcher.fetch(index)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “”, line 317, in getitem
File “”, line 134, in prepare_data
File “”, line 104, in forward
File “”, line 190, in call
File “<array_function internals>”, line 180, in stack
File “/opt/conda/lib/python3.8/site-packages/numpy/core/shape_base.py”, line 422, in stack
raise ValueError(‘need at least one array to stack’)
ValueError: need at least one array to stack
2023-09-27 06:55:47,606 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Can you please let me know why this error happened? I’m not sure what it’s trying to say exactly. I’ve only used a portion of the KITTI dataset, and I’m also wondering if the data amount could be a factor. Thank you!