PointPillars - Issues running TAO training

Please provide the following information when requesting support.

• Network Type (PointPillars)
• TLT Version (4.0.1)
• Training spec file (pointpillars.yaml from getting_started_v4.0.1)
• How to reproduce the issue?
Input from Jupyter Notebook(getting_started_v4.0.1) to Run TAO Training:
!tao pointpillars train -e $SPECS_DIR/pointpillars.yaml
-k $KEY

Error messages:
2023-09-27 06:55:41,108 [INFO] root: Registry: [‘nvcr.io’]
2023-09-27 06:55:41,184 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
2023-09-27 06:55:41,245 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/avresearch/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
python /opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/train.py --cfg_file /home/avresearch/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/pointpillars/specs/pointpillars.yaml --output_dir /home/avresearch/tao-experiments-2/pointpillars --key tlt_encode
INFO: Start logging
INFO: Database filter by min points Car: 553 => 25
INFO: Database filter by min points Pedestrian: 82 => 0
INFO: Database filter by min points Cyclist: 36 => 2
INFO: Loading point cloud dataset
INFO: Total samples for point cloud dataset: 159
/opt/conda/lib/python3.8/site-packages/torch/functional.py:484: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2984.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
INFO: Start training
epochs: 0%| | 0/80 [00:00<?, ?it/s]
epochs: 0%| | 0/80 [00:00<?, ?it/s]
Traceback (most recent call last):
File “</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/train.py>”, line 3, in
File “”, line 152, in
File “”, line 127, in main
File “”, line 93, in train_model
File “”, line 24, in train_one_epoch
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 681, in next
data = self._next_data()
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1374, in _next_data
return self._process_data(data)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1400, in _process_data
File “/opt/conda/lib/python3.8/site-packages/torch/_utils.py”, line 543, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py”, line 302, in _worker_loop
data = fetcher.fetch(index)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “”, line 317, in getitem
File “”, line 134, in prepare_data
File “”, line 104, in forward
File “”, line 190, in call
File “<array_function internals>”, line 180, in stack
File “/opt/conda/lib/python3.8/site-packages/numpy/core/shape_base.py”, line 422, in stack
raise ValueError(‘need at least one array to stack’)
ValueError: need at least one array to stack

2023-09-27 06:55:47,606 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Can you please let me know why this error happened? I’m not sure what it’s trying to say exactly. I’ve only used a portion of the KITTI dataset, and I’m also wondering if the data amount could be a factor. Thank you!

Suggest you follow official notebook and run.

I believe I followed all the instructions except that I used a portion of the KITTI data instead of the entire dataset. Do you mean that could cause this error?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Yes, suggest you to follow the cells to generate dataset and retry.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.