PointPillars - Issues running TAO training

gihyeob2 · September 27, 2023, 12:04pm

Please provide the following information when requesting support.

• Network Type (PointPillars)
• TLT Version (4.0.1)
• Training spec file (pointpillars.yaml from getting_started_v4.0.1)
• How to reproduce the issue?
Input from Jupyter Notebook(getting_started_v4.0.1) to Run TAO Training:
!tao pointpillars train -e $SPECS_DIR/pointpillars.yaml
-r $USER_EXPERIMENT_DIR
-k $KEY

Error messages:
2023-09-27 06:55:41,108 [INFO] root: Registry: [‘nvcr.io’]
2023-09-27 06:55:41,184 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
2023-09-27 06:55:41,245 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/avresearch/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
python /opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/train.py --cfg_file /home/avresearch/getting_started_v4.0.1/notebooks/tao_launcher_starter_kit/pointpillars/specs/pointpillars.yaml --output_dir /home/avresearch/tao-experiments-2/pointpillars --key tlt_encode
INFO: Start logging
INFO: CUDA_VISIBLE_DEVICES=ALL
INFO: Database filter by min points Car: 553 => 25
INFO: Database filter by min points Pedestrian: 82 => 0
INFO: Database filter by min points Cyclist: 36 => 2
INFO: Loading point cloud dataset
INFO: Total samples for point cloud dataset: 159
/opt/conda/lib/python3.8/site-packages/torch/functional.py:484: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2984.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
INFO: Start training
epochs: 0%| | 0/80 [00:00<?, ?it/s]
epochs: 0%| | 0/80 [00:00<?, ?it/s]
Traceback (most recent call last):
File “</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/train.py>”, line 3, in
File “”, line 152, in
File “”, line 127, in main
File “”, line 93, in train_model
File “”, line 24, in train_one_epoch
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 681, in next
data = self._next_data()
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1374, in _next_data
return self._process_data(data)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1400, in _process_data
data.reraise()
File “/opt/conda/lib/python3.8/site-packages/torch/_utils.py”, line 543, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py”, line 302, in _worker_loop
data = fetcher.fetch(index)
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File “/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py”, line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File “”, line 317, in getitem
File “”, line 134, in prepare_data
File “”, line 104, in forward
File “”, line 190, in call
File “<array_function internals>”, line 180, in stack
File “/opt/conda/lib/python3.8/site-packages/numpy/core/shape_base.py”, line 422, in stack
raise ValueError(‘need at least one array to stack’)
ValueError: need at least one array to stack

2023-09-27 06:55:47,606 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Can you please let me know why this error happened? I’m not sure what it’s trying to say exactly. I’ve only used a portion of the KITTI dataset, and I’m also wondering if the data amount could be a factor. Thank you!

Morganh · September 27, 2023, 3:40pm

Suggest you follow official notebook and run.

gihyeob2 · September 27, 2023, 3:48pm

I believe I followed all the instructions except that I used a portion of the KITTI data instead of the entire dataset. Do you mean that could cause this error?

Morganh · September 27, 2023, 3:51pm

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Yes, suggest you to follow the cells to generate dataset and retry.

system · October 17, 2023, 1:51am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PointPillars; cannot find the trained model TAO Toolkit	3	277	October 3, 2023
Spec file related errors for PointPillars TAO Toolkit	5	289	September 26, 2023
Error in TAO-Toolkit while training TAO Toolkit	2	1112	January 4, 2022
Issues Running Jupyter Notebook While setting up TAO TAO Toolkit	9	48	March 10, 2025
Nvidia tao pointpillars 'EasyDict' object has no attribute 'train' TAO Toolkit	2	183	May 22, 2024
Train Pointpillar with Multi-GPU TAO Toolkit tao	11	2497	August 29, 2023
TAO Toolkit - FPENet - Dataset_Convert error TAO Toolkit	14	719	October 6, 2023
Train TAO Toolkit PointPillars object detection model without calibration files TAO Toolkit	7	717	August 8, 2022
Error while training ActionRecognitionNet with TAO TAO Toolkit	14	1507	February 8, 2022
An error occurred while preparing the data set using TAO TAO Toolkit	14	1378	October 19, 2021

PointPillars - Issues running TAO training

Related topics