Problems with PointPillar export

vittoria.cavicchioli1 · May 16, 2024, 1:18pm

Required Information

Hardware: NVIDIA GeForce RTX 4080
Network Type: PointPillar
GitHub Repository: tao_pytorch_backend
TLT Version (tlt info --verbose doesn’t work): docker is nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt-base → command used to run: docker run -it --rm --gpus all -v /path/to/project/tao_pytorch_backend:/tao-pt -v -e PYTHONPATH=/tao-pt:$PYTHONPATH -e PYTHONPATH=/tao-pt --shm-size 16G --net=host nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt-base
Training spec file: pointpillar_general.yaml
How to reproduce the issue: see later

Documentation I followed

Forum topics I already checked

My training

I managed to train from scratch on KITTI dataset, on cars only.
log_train_20240507-082745.txt

2024-05-07 08:27:45,090   INFO  **********************Start logging**********************
2024-05-07 08:27:45,090   INFO  CUDA_VISIBLE_DEVICES=ALL
2024-05-07 08:27:45,436   INFO  Loading point cloud dataset
2024-05-07 08:27:45,489   INFO  Total samples for point cloud dataset: 5366
2024-05-07 08:27:45,658   INFO  **********************Start training**********************
2024-05-07 18:23:57,833   INFO  **********************End training**********************

status.json

{"date": "5/7/2024", "time": "8:27:45", "status": "STARTED", "verbosity": "INFO", "message": "Starting PointPillars training"}
{"epoch": 0, "time_per_epoch": "0:07:23.530897", "max_epoch": 80, "eta": "9:51:22.471797", "date": "5/7/2024", "time": "8:35:9", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003064909424133253, "loss": 1.3810601234436035}}
{"epoch": 1, "time_per_epoch": "0:07:35.166232", "max_epoch": 80, "eta": "9:59:18.132325", "date": "5/7/2024", "time": "8:42:45", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003259206078755463, "loss": 1.2703205347061157}}
{"epoch": 2, "time_per_epoch": "0:07:42.412781", "max_epoch": 80, "eta": "10:01:08.196897", "date": "5/7/2024", "time": "8:50:28", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003581018816993485, "loss": 0.8312548995018005}}
{"epoch": 3, "time_per_epoch": "0:07:43.532270", "max_epoch": 80, "eta": "9:54:51.984766", "date": "5/7/2024", "time": "8:58:12", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0004027248406257354, "loss": 0.601682186126709}}
{"epoch": 4, "time_per_epoch": "0:07:35.183181", "max_epoch": 80, "eta": "9:36:33.921729", "date": "5/7/2024", "time": "9:5:48", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00045935974116685435, "loss": 0.7014744877815247}}
{"epoch": 5, "time_per_epoch": "0:07:36.934350", "max_epoch": 80, "eta": "9:31:10.076217", "date": "5/7/2024", "time": "9:13:25", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0005274611582707094, "loss": 0.9280206561088562}}
{"epoch": 6, "time_per_epoch": "0:07:35.909267", "max_epoch": 80, "eta": "9:22:17.285785", "date": "5/7/2024", "time": "9:21:2", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0006063732380625683, "loss": 0.8185919523239136}}
{"epoch": 7, "time_per_epoch": "0:07:34.102590", "max_epoch": 80, "eta": "9:12:29.489096", "date": "5/7/2024", "time": "9:28:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0006953360140763047, "loss": 0.5776488780975342}}
{"epoch": 8, "time_per_epoch": "0:07:31.029943", "max_epoch": 80, "eta": "9:01:14.155902", "date": "5/7/2024", "time": "9:36:8", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0007934927261469067, "loss": 0.7977230548858643}}
{"epoch": 9, "time_per_epoch": "0:07:35.350255", "max_epoch": 80, "eta": "8:58:49.868115", "date": "5/7/2024", "time": "9:43:44", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0008998980714792163, "loss": 0.759836733341217}}
{"epoch": 10, "time_per_epoch": "0:07:34.159572", "max_epoch": 80, "eta": "8:49:51.170060", "date": "5/7/2024", "time": "9:51:19", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0010135273084306063, "loss": 0.5445941686630249}}
{"epoch": 11, "time_per_epoch": "0:07:35.378166", "max_epoch": 80, "eta": "8:43:41.093459", "date": "5/7/2024", "time": "9:58:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0011332861253331745, "loss": 0.6107680201530457}}
{"epoch": 12, "time_per_epoch": "0:07:29.786740", "max_epoch": 80, "eta": "8:29:45.498304", "date": "5/7/2024", "time": "10:6:25", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0012580211793133203, "loss": 0.5008313059806824}}
{"epoch": 13, "time_per_epoch": "0:07:34.622371", "max_epoch": 80, "eta": "8:27:39.698833", "date": "5/7/2024", "time": "10:14:0", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001386531203614099, "loss": 0.6656970977783203}}
{"epoch": 14, "time_per_epoch": "0:07:30.476321", "max_epoch": 80, "eta": "8:15:31.437162", "date": "5/7/2024", "time": "10:21:31", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0015175785764507683, "loss": 0.7663007974624634}}
{"epoch": 15, "time_per_epoch": "0:07:31.792888", "max_epoch": 80, "eta": "8:09:26.537725", "date": "5/7/2024", "time": "10:29:4", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0016499012399851304, "loss": 0.46346515417099}}
{"epoch": 16, "time_per_epoch": "0:07:33.283206", "max_epoch": 80, "eta": "8:03:30.125179", "date": "5/7/2024", "time": "10:36:38", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0017822248546324234, "loss": 0.3259502351284027}}
{"epoch": 17, "time_per_epoch": "0:07:24.865216", "max_epoch": 80, "eta": "7:47:06.508636", "date": "5/7/2024", "time": "10:44:3", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001913275071648148, "loss": 0.6114525198936462}}
{"epoch": 18, "time_per_epoch": "0:07:22.195047", "max_epoch": 80, "eta": "7:36:56.092898", "date": "5/7/2024", "time": "10:51:26", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002041789805803107, "loss": 0.5467955470085144}}
{"epoch": 19, "time_per_epoch": "0:07:24.841465", "max_epoch": 80, "eta": "7:32:15.329358", "date": "5/7/2024", "time": "10:58:51", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0021665313899540883, "loss": 0.6311314702033997}}
{"epoch": 20, "time_per_epoch": "0:07:23.714800", "max_epoch": 80, "eta": "7:23:42.887972", "date": "5/7/2024", "time": "11:6:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0022862984944550316, "loss": 0.4225645959377289}}
{"epoch": 21, "time_per_epoch": "0:07:25.572585", "max_epoch": 80, "eta": "7:18:08.782537", "date": "5/7/2024", "time": "11:13:42", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002399937696618234, "loss": 0.4868691563606262}}
{"epoch": 22, "time_per_epoch": "0:07:24.633668", "max_epoch": 80, "eta": "7:09:48.752744", "date": "5/7/2024", "time": "11:21:7", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0025063545888053566, "loss": 0.6387386322021484}}
{"epoch": 23, "time_per_epoch": "0:07:24.312719", "max_epoch": 80, "eta": "7:02:05.824984", "date": "5/7/2024", "time": "11:28:32", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0026045243181712467, "loss": 0.5239708423614502}}
{"epoch": 24, "time_per_epoch": "0:07:25.113793", "max_epoch": 80, "eta": "6:55:26.372427", "date": "5/7/2024", "time": "11:35:58", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0026935014565570774, "loss": 0.407818466424942}}
{"epoch": 25, "time_per_epoch": "0:07:23.972846", "max_epoch": 80, "eta": "6:46:58.506531", "date": "5/7/2024", "time": "11:43:22", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002772429105480342, "loss": 0.4710182845592499}}
{"epoch": 26, "time_per_epoch": "0:07:22.670298", "max_epoch": 80, "eta": "6:38:24.196090", "date": "5/7/2024", "time": "11:50:45", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0028405471485356687, "loss": 0.6307605504989624}}
{"epoch": 27, "time_per_epoch": "0:07:24.125681", "max_epoch": 80, "eta": "6:32:18.661090", "date": "5/7/2024", "time": "11:58:10", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0028971995717313234, "loss": 0.8641102313995361}}
{"epoch": 28, "time_per_epoch": "0:07:20.112568", "max_epoch": 80, "eta": "6:21:25.853554", "date": "5/7/2024", "time": "12:5:31", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002941840781262567, "loss": 0.6356572508811951}}
{"epoch": 29, "time_per_epoch": "0:07:22.350795", "max_epoch": 80, "eta": "6:15:59.890523", "date": "5/7/2024", "time": "12:12:54", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002974040857878247, "loss": 0.7764367461204529}}
{"epoch": 30, "time_per_epoch": "0:07:21.374408", "max_epoch": 80, "eta": "6:07:48.720419", "date": "5/7/2024", "time": "12:20:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002993489697238202, "loss": 0.4349867105484009}}
{"epoch": 31, "time_per_epoch": "0:07:29.639962", "max_epoch": 80, "eta": "6:07:12.358129", "date": "5/7/2024", "time": "12:27:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029999999963875776, "loss": 0.441621333360672}}
{"epoch": 32, "time_per_epoch": "0:07:31.809968", "max_epoch": 80, "eta": "6:01:26.878446", "date": "5/7/2024", "time": "12:35:18", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029967931997491124, "loss": 0.5041215419769287}}
{"epoch": 33, "time_per_epoch": "0:07:36.250737", "max_epoch": 80, "eta": "5:57:23.784623", "date": "5/7/2024", "time": "12:42:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002987176967241224, "loss": 0.40223047137260437}}
{"epoch": 34, "time_per_epoch": "0:07:29.651408", "max_epoch": 80, "eta": "5:44:43.964753", "date": "5/7/2024", "time": "12:50:25", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029711924788763493, "loss": 0.5588961839675903}}
{"epoch": 35, "time_per_epoch": "0:07:27.479540", "max_epoch": 80, "eta": "5:35:36.579287", "date": "5/7/2024", "time": "12:57:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029489081826876507, "loss": 0.36693549156188965}}
{"epoch": 36, "time_per_epoch": "0:07:28.327920", "max_epoch": 80, "eta": "5:28:46.428481", "date": "5/7/2024", "time": "13:5:22", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029204195034525557, "loss": 0.9166396260261536}}
{"epoch": 37, "time_per_epoch": "0:07:29.150868", "max_epoch": 80, "eta": "5:21:53.487321", "date": "5/7/2024", "time": "13:12:52", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00288584843406921, "loss": 0.7163593173027039}}
{"epoch": 38, "time_per_epoch": "0:07:37.383134", "max_epoch": 80, "eta": "5:20:10.091611", "date": "5/7/2024", "time": "13:20:30", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002845343013164161, "loss": 0.5028563737869263}}
{"epoch": 39, "time_per_epoch": "0:07:35.479606", "max_epoch": 80, "eta": "5:11:14.663833", "date": "5/7/2024", "time": "13:28:6", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0027990766911682287, "loss": 0.5387132167816162}}
{"epoch": 40, "time_per_epoch": "0:07:31.724850", "max_epoch": 80, "eta": "5:01:08.994012", "date": "5/7/2024", "time": "13:35:38", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002747247587575135, "loss": 0.4849264919757843}}
{"epoch": 41, "time_per_epoch": "0:07:30.573824", "max_epoch": 80, "eta": "4:52:52.379118", "date": "5/7/2024", "time": "13:43:10", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002690077642563414, "loss": 0.5199539661407471}}
{"epoch": 42, "time_per_epoch": "0:07:36.183149", "max_epoch": 80, "eta": "4:48:54.959673", "date": "5/7/2024", "time": "13:50:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002627811666614496, "loss": 0.4742547273635864}}
{"epoch": 43, "time_per_epoch": "0:07:23.678617", "max_epoch": 80, "eta": "4:33:36.108835", "date": "5/7/2024", "time": "13:58:11", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002560716292196641, "loss": 0.3814888298511505}}
{"epoch": 44, "time_per_epoch": "0:07:21.946651", "max_epoch": 80, "eta": "4:25:10.079438", "date": "5/7/2024", "time": "14:5:33", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002489078832003774, "loss": 0.34325650334358215}}
{"epoch": 45, "time_per_epoch": "0:07:21.781962", "max_epoch": 80, "eta": "4:17:42.368685", "date": "5/7/2024", "time": "14:12:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0024132060486384255, "loss": 0.5689999461174011}}
{"epoch": 46, "time_per_epoch": "0:07:30.936750", "max_epoch": 80, "eta": "4:15:31.849487", "date": "5/7/2024", "time": "14:20:27", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0023334228410071675, "loss": 0.4784387946128845}}
{"epoch": 47, "time_per_epoch": "0:07:25.671313", "max_epoch": 80, "eta": "4:05:07.153333", "date": "5/7/2024", "time": "14:27:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0022500708530536163, "loss": 0.589332103729248}}
{"epoch": 48, "time_per_epoch": "0:07:22.110762", "max_epoch": 80, "eta": "3:55:47.544388", "date": "5/7/2024", "time": "14:35:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0021635070107866206, "loss": 0.27218320965766907}}
{"epoch": 49, "time_per_epoch": "0:07:19.994904", "max_epoch": 80, "eta": "3:47:19.842036", "date": "5/7/2024", "time": "14:42:37", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00207410199386829, "loss": 0.48348644375801086}}
{"epoch": 50, "time_per_epoch": "0:07:24.929756", "max_epoch": 80, "eta": "3:42:27.892692", "date": "5/7/2024", "time": "14:50:2", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0019822386483067766, "loss": 0.37582147121429443}}
{"epoch": 51, "time_per_epoch": "0:07:20.617776", "max_epoch": 80, "eta": "3:32:57.915507", "date": "5/7/2024", "time": "14:57:23", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0018883103470508924, "loss": 0.39562439918518066}}
{"epoch": 52, "time_per_epoch": "0:07:29.628477", "max_epoch": 80, "eta": "3:29:49.597356", "date": "5/7/2024", "time": "15:4:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00179271930550675, "loss": 0.4453161954879761}}
{"epoch": 53, "time_per_epoch": "0:07:24.416505", "max_epoch": 80, "eta": "3:19:59.245644", "date": "5/7/2024", "time": "15:12:18", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0016958748591896452, "loss": 0.6267533302307129}}
{"epoch": 54, "time_per_epoch": "0:07:34.661344", "max_epoch": 80, "eta": "3:17:01.194940", "date": "5/7/2024", "time": "15:19:54", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0015981917108865375, "loss": 0.23352468013763428}}
{"epoch": 55, "time_per_epoch": "0:07:25.929298", "max_epoch": 80, "eta": "3:05:48.232458", "date": "5/7/2024", "time": "15:27:20", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001500088154835051, "loss": 0.5205101370811462}}
{"epoch": 56, "time_per_epoch": "0:07:24.857243", "max_epoch": 80, "eta": "2:57:56.573843", "date": "5/7/2024", "time": "15:34:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00140198428552333, "loss": 0.3533079922199249}}
{"epoch": 57, "time_per_epoch": "0:07:36.734620", "max_epoch": 80, "eta": "2:55:04.896266", "date": "5/7/2024", "time": "15:42:23", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0013043001987809468, "loss": 0.5288237929344177}}
{"epoch": 58, "time_per_epoch": "0:07:22.240888", "max_epoch": 80, "eta": "2:42:09.299538", "date": "5/7/2024", "time": "15:49:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0012074541928640665, "loss": 0.5628782510757446}}
{"epoch": 59, "time_per_epoch": "0:07:21.112908", "max_epoch": 80, "eta": "2:34:23.371059", "date": "5/7/2024", "time": "15:57:8", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001111860977238095, "loss": 0.31243863701820374}}
{"epoch": 60, "time_per_epoch": "0:07:21.676104", "max_epoch": 80, "eta": "2:27:13.522089", "date": "5/7/2024", "time": "16:4:30", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0010179298967280793, "loss": 0.24063703417778015}}
{"epoch": 61, "time_per_epoch": "0:07:22.304313", "max_epoch": 80, "eta": "2:20:03.781941", "date": "5/7/2024", "time": "16:11:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0009260631786413252, "loss": 0.4758918583393097}}
{"epoch": 62, "time_per_epoch": "0:07:20.590280", "max_epoch": 80, "eta": "2:12:10.625033", "date": "5/7/2024", "time": "16:19:14", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0008366542103683161, "loss": 0.48947787284851074}}
{"epoch": 63, "time_per_epoch": "0:07:20.320319", "max_epoch": 80, "eta": "2:04:45.445420", "date": "5/7/2024", "time": "16:26:35", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0007500858548375109, "loss": 0.38016995787620544}}
{"epoch": 64, "time_per_epoch": "0:07:19.587276", "max_epoch": 80, "eta": "1:57:13.396410", "date": "5/7/2024", "time": "16:33:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0006667288110375084, "loss": 0.39876535534858704}}
{"epoch": 65, "time_per_epoch": "0:07:19.526198", "max_epoch": 80, "eta": "1:49:52.892974", "date": "5/7/2024", "time": "16:41:15", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0005869400266270666, "loss": 0.4667430520057678}}
{"epoch": 66, "time_per_epoch": "0:07:20.005884", "max_epoch": 80, "eta": "1:42:40.082382", "date": "5/7/2024", "time": "16:48:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0005110611694304271, "loss": 0.29991692304611206}}
{"epoch": 67, "time_per_epoch": "0:07:19.748235", "max_epoch": 80, "eta": "1:35:16.727050", "date": "5/7/2024", "time": "16:55:56", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0004394171643632412, "loss": 0.341851145029068}}
{"epoch": 68, "time_per_epoch": "0:07:19.018176", "max_epoch": 80, "eta": "1:27:48.218115", "date": "5/7/2024", "time": "17:3:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.000372314802054194, "loss": 0.3857216238975525}}
{"epoch": 69, "time_per_epoch": "0:07:19.552546", "max_epoch": 80, "eta": "1:20:35.078006", "date": "5/7/2024", "time": "17:10:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003100414251204348, "loss": 0.31028807163238525}}
{"epoch": 70, "time_per_epoch": "0:07:20.258756", "max_epoch": 80, "eta": "1:13:22.587557", "date": "5/7/2024", "time": "17:17:57", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0002528636977223765, "loss": 0.4571707844734192}}
{"epoch": 71, "time_per_epoch": "0:07:18.766678", "max_epoch": 80, "eta": "1:05:48.900103", "date": "5/7/2024", "time": "17:25:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00020102646366682223, "loss": 0.4598176181316376}}
{"epoch": 72, "time_per_epoch": "0:07:19.548373", "max_epoch": 80, "eta": "0:58:36.386986", "date": "5/7/2024", "time": "17:32:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0001547516979481945, "loss": 0.2587343454360962}}
{"epoch": 73, "time_per_epoch": "0:07:19.405268", "max_epoch": 80, "eta": "0:51:15.836877", "date": "5/7/2024", "time": "17:39:56", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00011423755621753237, "loss": 0.8911759853363037}}
{"epoch": 74, "time_per_epoch": "0:07:19.128135", "max_epoch": 80, "eta": "0:43:54.768811", "date": "5/7/2024", "time": "17:47:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 7.965752624957046e-05, "loss": 0.45557257533073425}}
{"epoch": 75, "time_per_epoch": "0:07:19.406747", "max_epoch": 80, "eta": "0:36:37.033734", "date": "5/7/2024", "time": "17:54:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 5.1159685041454155e-05, "loss": 0.26564571261405945}}
{"epoch": 76, "time_per_epoch": "0:07:19.508782", "max_epoch": 80, "eta": "0:29:18.035129", "date": "5/7/2024", "time": "18:1:56", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 2.8866064724304957e-05, "loss": 0.48253774642944336}}
{"epoch": 77, "time_per_epoch": "0:07:20.132161", "max_epoch": 80, "eta": "0:22:00.396483", "date": "5/7/2024", "time": "18:9:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 1.2872130002899723e-05, "loss": 0.6122196912765503}}
{"epoch": 78, "time_per_epoch": "0:07:20.184826", "max_epoch": 80, "eta": "0:14:40.369652", "date": "5/7/2024", "time": "18:16:37", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 3.2463693611489186e-06, "loss": 0.1481819897890091}}
{"epoch": 79, "time_per_epoch": "0:07:18.947498", "max_epoch": 80, "eta": "0:07:18.947498", "date": "5/7/2024", "time": "18:23:57", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 3.0001783894512056e-08, "loss": 0.2755662798881531}}
{"date": "5/7/2024", "time": "18:23:57", "status": "SUCCESS", "verbosity": "INFO", "message": "Training finished successfully.", "kpi": {"learning_rate": 3.0001783894512056e-08, "loss": 0.2755662798881531}}

This is the output model: checkpoint_epoch_80.tlt

Performance of this model on with evaluation.py script on my validation set is:

Average predicted number of objects(1318 samples): 6.969

Car AP@0.50, 0.50:
bev  AP:83.1364
3d   AP:78.0459
bev mAP: 83.1364
3d mAP: 78.0459

Problem: conversion to tensorrt

Note: for conversion max_points_num in inference section of pointpillar_general.yaml was set to max_points_num: 204800.

I tried to convert the model into tensorrt engine in two ways.

Method 1: export script

First I tried to use the provided export.py script.

Question: why do we set dummy_voxel_num_points and dummy_coords as torch.int32 and we don’t keep them as float?

python nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py --cfg_file nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml --save_engine path/to/output/checkpoint_epoch_80.engine  --key tlt_encode

I obtain an engine checkpoint_epoch_80.engine

However all metrics drop to zero.

Evaluation with checkpoint_epoch_80.engine

Average predicted number of objects(1318 samples): 1.174

2024-05-16 12:47:50,937   INFO  Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Method 2: `trtexec`

I tried to take the onnx generated by export script (checkpoint_epoch_80.onnx the one after simplification with graph surgeon) and I tried to generate the engine with trtexec → checkpoint_epoch_80_trtexec.engine

trtexec --onnx=/path/to/checkpoint_epoch_80.onnx \
        --maxShapes=points:1x204800x4,num_points:1 \
        --minShapes=points:1x204800x4,num_points:1 \
        --optShapes=points:1x204800x4,num_points:1 \
        --fp16 \
        --saveEngine=/path/to/checkpoint_epoch_80_trtexec.engine

However the metrics are still zero:

Average predicted number of objects(1318 samples): 1.169

Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Experiments

I tried to:

remove --fp16 flag when using trtexec
use different docker as suggested in other topics
convert the non-simplified onnx with trtexec → this kind of worked but it was outputting nans. To overcome this problem I used the --best flag during conversion. However the prediction are still wrong.

Morganh · May 16, 2024, 2:37pm

Please refer to The effect is very poor when converted to trt - #62 by Morganh to use TRT8.6.1 and retry.

vittoria.cavicchioli1 · May 16, 2024, 2:40pm

I’m already using TRT8.6.1

pip list | grep tensorrt
tensorrt                    8.6.1
torch-tensorrt              2.2.0a0

Morganh · May 16, 2024, 2:53pm

Please try to follow the end of following link .
From PointPillars - NVIDIA Docs,
A TensorRT sample is developed as a demo to show how to deploy PointPillars models trained in TAO Toolkit.

vittoria.cavicchioli1 · May 16, 2024, 3:10pm

Is there a docker already containing TensorRT 8.2(or above) and TensorRT OSS 22.02?

In addition, do you confirm the export script doesn’t work correctly for exporting pointpillar?

Morganh · May 16, 2024, 3:18pm

Please directly use the tao-deploy docker(nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy) . It contains TensorRT8.6.1.6. The docker file is in tao_deploy/docker/Dockerfile at main · NVIDIA/tao_deploy · GitHub. Run in the docker, then use trtexec to generate tensorrt engine, and do the last two steps( Clone the repo and Run the TensorRT Inference )

vittoria.cavicchioli1 · May 16, 2024, 4:11pm

It doesn’t work with this docker either.

Steps I performed:

run docker

docker run -it --rm --gpus all -e PYTHONPATH=/tao-pt:$PYTHONPATH -e PYTHONPATH=/tao-pt  --shm-size 16G   --net=host nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy

export onnx to trt

trtexec --onnx=checkpoint_epoch_80.onnx \
        --maxShapes=points:1x204800x4,num_points:1 \
        --minShapes=points:1x204800x4,num_points:1 \
        --optShapes=points:1x204800x4,num_points:1 \
        --fp16 \
        --saveEngine=checkpoint_epoch_80.engine

Clone the repo

git clone https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes.git

follow last two steps

cd tao_toolkit_recipes/tao_pointpillars/tensorrt_sample/test
mkdir build
cd build
cmake .. -DCUDA_VERSION=11.8
sudo apt-get install libboost-all-dev # needed to be able to compile
make -j4

run inference

./pointpillars -e path/to/checkpoint_epoch_80.engine -l /path/to/mybin.bin -t 0.01 -c Vehicle -n 4096 -p -d fp16

output

TIME: doinfer: 99.3167 ms.
Vehicle, 59.085323, 16.726288, -1.349629, 4.166372, 1.631799, 1.426950, 4.644820, 0.227298
TIME: pointpillar: 99.7745 ms.
Bndbox objs: 1
Saved prediction in: mybin.txt

mybin.txt

59.0853 16.7263 -1.34963 1.6318 4.16637 1.42695 4.64482 0 0.227298

Which doesn’t match at all the expected output as the output with the original .tlt model is:

Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.5022 1.6059 3.5395 13.5955 -2.8459 -0.8949 6.3548 0.9109
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.6122 1.6827 4.0306 12.6629 3.2928 -0.8674 6.3130 0.9073
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.5411 1.6050 3.3492 8.9439 -3.2063 -0.8594 6.2721 0.8128
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4886 1.5999 3.7508 2.7487 -3.1261 -0.9196 6.2323 0.7272
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.5183 1.6585 3.6357 26.6466 -2.6679 -0.8169 6.2843 0.3774
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4886 1.6189 3.9054 69.1245 -0.7873 -0.5102 3.1647 0.2728
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4437 1.6175 3.8565 -21.7086 6.8582 -1.2460 4.7063 0.2456
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4267 1.6319 4.1663 -23.6664 -7.7901 -1.3491 4.6448 0.2259
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.7588 1.8220 4.5767 52.2319 -1.9124 -0.4946 6.2961 0.1866
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4859 1.6786 4.1968 -43.0405 -5.9368 -1.3399 6.7888 0.1848
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4939 1.6477 4.0340 70.1843 5.6645 -1.2091 3.1312 0.1006

Morganh · May 16, 2024, 4:45pm

To narrow down, can you go back to tao-pyt docker and run evaluation or inference against tensort engine?
See

https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html#evaluating-the-model and https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html#running-inference-on-the-pointpillars-model.

vittoria.cavicchioli1 · May 16, 2024, 5:08pm

If you mean with this new engine generated in the new docker the results are still the same (running evaluation in nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt-base docker) :

2024-05-16 17:06:01,609   INFO  **********************Start logging**********************
2024-05-16 17:06:01,609   INFO  CUDA_VISIBLE_DEVICES=ALL
2024-05-16 17:06:01,765   INFO  Loading point cloud dataset
2024-05-16 17:06:01,778   INFO  Total samples for point cloud dataset: 1318
2024-05-16 17:06:02,234   INFO  *************** EVALUATION *****************
2024-05-16 17:06:34,291   INFO  *************** Performance *****************
2024-05-16 17:06:34,292   INFO  Generate label finished(sec_per_example: 0.0243 second).
2024-05-16 17:06:34,292   INFO  recall_roi_0.3: 0.000000
2024-05-16 17:06:34,292   INFO  recall_rcnn_0.3: 0.000000
2024-05-16 17:06:34,292   INFO  recall_roi_0.5: 0.000000
2024-05-16 17:06:34,292   INFO  recall_rcnn_0.5: 0.000000
2024-05-16 17:06:34,292   INFO  recall_roi_0.7: 0.000000
2024-05-16 17:06:34,292   INFO  recall_rcnn_0.7: 0.000000
2024-05-16 17:06:34,292   INFO  Average predicted number of objects(1318 samples): 1.168
2024-05-16 17:06:34,500   INFO  Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

If you mean my original engine you can see it in my original question.

Morganh · May 16, 2024, 5:17pm

Please use the official model to run.
You can run as below.
$ dcoker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt /bin/bash
Then inside the docker,
Follow PointPillars - NVIDIA Docs, but without “tao model” in the beginning of command line.
# pointpillars export xxx
# pointpillars evaluation xxx
# pointpillars inference xxx

vittoria.cavicchioli1 · May 17, 2024, 8:43am

I used the docker 5.3.0-pyt:

docker run -it --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --runtime=nvidia nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt /bin/bash

I downloaded the official pointpillars_trainable.tlt model from here

Since this model was trained with all the 3 classes I restored all 3 classes related arguments in the configuration file.
Performance of pointpillars_trainable.tlt on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 1.219
Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Pedestrian AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Cyclist AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Performance of pointpillars_trainable.engine on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 11.292
Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Pedestrian AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Cyclist AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Since all metrics were zero even with the tlt model it couldn’t be very informative of the success of the export itself. For this reason I also tried with my model trained on my version of KITTI: checkpoint_kitti_80.tlt.

Performance of checkpoint_kitti_80.tlt on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 17.327
Car AP@0.50, 0.50:
bev  AP:81.6425
3d   AP:80.0362
bev mAP: 81.6425
3d mAP: 80.0362

Performance of checkpoint_kitti_80.engine on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 1.170
Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Morganh · May 17, 2024, 9:33am

May I know that if you are running with default notebook and the default dataset mentioned in it?

vittoria.cavicchioli1 · May 20, 2024, 4:00pm

I managed to make the tensorrt work however I still have some doubts about final output metrics. By running the evaluation on the exact same val set multiple times I obtain quite different metrics.

FP16 trt
Car AP@0.50, 0.50:
bev AP:80.8023
3d AP:64.2504

bev AP:78.1450
3d AP:64.4269

bev AP:80.5636
3d AP:64.1166

bev AP:80.6485
3d AP:64.0952

bev AP:79.1487
3d AP:63.4226

FP32 trt
Car AP@0.50, 0.50:
bev AP:80.4541
3d AP:63.9492

bev AP:80.8236
3d AP:66.5927

bev AP:80.8545
3d AP:64.2770

bev AP:78.5316
3d AP:63.9503

bev AP:80.2531
3d AP:64.0063

Is this behavior normal even if sometimes difference is > 1.5 between one evaluation and another?

Morganh · May 21, 2024, 3:46am

May I know what changes did you do to make tensorrt engine work?

How about inference to check the visualize images? Is it consistent?

vittoria.cavicchioli1 · May 21, 2024, 7:51am

In the training I did some changes in the configuration file, including the removal of cyclist and pedestrian related arguments. However in the anchor generation config I left them there by mistake. I didn’t think this inconsistency could impact the tensorrt as the tlt model was working fine but apparently this was the problem.

The inference is consistent but a deviation of 1.5 is hard to spot visually I guess. However, I suppose I shouldn’t see these oscillations in metrics especially with fp32.

Morganh · May 21, 2024, 8:47am

Could you run evaluation against very small part of dataset(even 1) to check if it is consistent?

vittoria.cavicchioli1 · May 21, 2024, 8:57am

What do you mean? I was referring to the modification in the configuration file where I left cyclist and pedestrian arguments in the anchor generation config. The fact that the anchor generator still had those two arguments but just ‘car’ in the class definition messed up the tensorrt performance. now I already retrained the model removing those as well. Thanks to that now I have good performances with tensorrt as well.

However, now my question is different: why do I obtain oscillating metrics upon multiple evaluation on the same val set?

I also found some (rare) inconsistencies visually by running inference multiple times on the same val set:

Please note that differently from trt, with tlt model I always obtain the exact same metrics and outputs

Morganh · May 21, 2024, 9:03am

Yes, what I mean is also about this question. Just want to use less dataset to narrow down.
For the visualization result, it is really odd. The same command and same val set?

vittoria.cavicchioli1 · May 21, 2024, 9:22am

Yes the exact same command and same val set.

By narrowing down to one single val sample (the one I showed before that was different across 2 different evaluation runs) and running multiple inferences and multiple evaluations the results seem always the same but I’m not sure this is by chance…

This is my current config file pointpillar_general.yaml.

This the output when I generated the engine

FP16

pointpillars export -r ****** -e nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml -k tlt_encode --save_engine ****** --data_type fp16
python /usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py --cfg_file nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml --key tlt_encode --save_engine ****** --data_type fp16


INFO: Exporting the model...
INFO: Starting PointPillars export
WARNING: 'decrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.decrypt_stream()' instead.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:64: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  inputs_shape = inputs.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:80: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x_max_shape = x_max.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:132: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = coords[..., 0].max().int().item() + 1
/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
WARNING: 'encrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.encrypt_stream()' instead.
INFO: Model exported to ******
INFO: Model exported to ******
[05/20/2024-15:40:03] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/20/2024-15:40:30] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/20/2024-15:40:30] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/20/2024-15:40:30] [TRT] [W] Check verbose logs for the list of affected weights.
[05/20/2024-15:40:30] [TRT] [W] - 21 weights are affected by this issue: Detected subnormal FP16 values.
[05/20/2024-15:40:30] [TRT] [W] - 10 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
INFO: TensorRT engine saved to ******
INFO: TensorRT engine saved to ******
INFO: Export finished successfully.

FP32

pointpillars export -r ****** -e nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml -k tlt_encode --save_engine ******
python /usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py --cfg_file nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml --key tlt_encode --save_engine ****** --data_type fp32


INFO: Exporting the model...
INFO: Starting PointPillars export
WARNING: 'decrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.decrypt_stream()' instead.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:64: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  inputs_shape = inputs.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:80: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x_max_shape = x_max.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:132: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = coords[..., 0].max().int().item() + 1
/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
WARNING: 'encrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.encrypt_stream()' instead.
INFO: Model exported to ******
INFO: Model exported to ******
[05/20/2024-15:38:45] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
INFO: TensorRT engine saved to ******
INFO: TensorRT engine saved to ******
INFO: Export finished successfully.

The results I showed you so far, if not specified differently, were related to the engine exported with fp32

vittoria.cavicchioli1 · May 23, 2024, 9:22am

do you have any suggestion?

Topic		Replies	Views
Incorrect pointpillar inference results TAO Toolkit	22	669	February 27, 2024
The effect is very poor when converted to trt TAO Toolkit tensorrt , ubuntu	61	1929	September 11, 2023
Pointpillar engine has weird oscillating performances TensorRT tensorrt , cuda , ubuntu , pytorch , python , cudnn	1	196	May 29, 2024
TAO Converter Provide three optimization profiles for pointpillar TAO Toolkit	10	613	March 6, 2024
Tao training - Visualise inference after training provides 98% accuracy, however, after model export to TensorRT, the inference result is 0% TAO Toolkit	5	659	March 12, 2022
Error when evaluate PointPillar network TAO Toolkit	6	883	June 4, 2023
Tao-converter not working for custom pointpillars TAO Toolkit tao	9	964	June 6, 2023
Poor Result After INT8 Optimization (TLT Getting Started Guide) TAO Toolkit	32	1738	October 12, 2021
Error while executing the fastest RCNN example on the tlt officialy provided docker in my intel computer TAO Toolkit tensorflow , docker	9	1118	October 12, 2021
Tlt-convert for custom trained YoloV4 model failed on Jetson Nano 4G TAO Toolkit	42	2640	August 27, 2021