Problems with PointPillar export

Required Information

  • Hardware: NVIDIA GeForce RTX 4080
  • Network Type: PointPillar
  • GitHub Repository: tao_pytorch_backend
  • TLT Version (tlt info --verbose doesn’t work): docker is nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt-base → command used to run: docker run -it --rm --gpus all -v /path/to/project/tao_pytorch_backend:/tao-pt -v -e PYTHONPATH=/tao-pt:$PYTHONPATH -e PYTHONPATH=/tao-pt --shm-size 16G --net=host nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt-base
  • Training spec file: pointpillar_general.yaml
  • How to reproduce the issue: see later

Documentation I followed

Forum topics I already checked

My training

I managed to train from scratch on KITTI dataset, on cars only.
log_train_20240507-082745.txt

2024-05-07 08:27:45,090   INFO  **********************Start logging**********************
2024-05-07 08:27:45,090   INFO  CUDA_VISIBLE_DEVICES=ALL
2024-05-07 08:27:45,436   INFO  Loading point cloud dataset
2024-05-07 08:27:45,489   INFO  Total samples for point cloud dataset: 5366
2024-05-07 08:27:45,658   INFO  **********************Start training**********************
2024-05-07 18:23:57,833   INFO  **********************End training**********************

status.json

{"date": "5/7/2024", "time": "8:27:45", "status": "STARTED", "verbosity": "INFO", "message": "Starting PointPillars training"}
{"epoch": 0, "time_per_epoch": "0:07:23.530897", "max_epoch": 80, "eta": "9:51:22.471797", "date": "5/7/2024", "time": "8:35:9", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003064909424133253, "loss": 1.3810601234436035}}
{"epoch": 1, "time_per_epoch": "0:07:35.166232", "max_epoch": 80, "eta": "9:59:18.132325", "date": "5/7/2024", "time": "8:42:45", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003259206078755463, "loss": 1.2703205347061157}}
{"epoch": 2, "time_per_epoch": "0:07:42.412781", "max_epoch": 80, "eta": "10:01:08.196897", "date": "5/7/2024", "time": "8:50:28", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003581018816993485, "loss": 0.8312548995018005}}
{"epoch": 3, "time_per_epoch": "0:07:43.532270", "max_epoch": 80, "eta": "9:54:51.984766", "date": "5/7/2024", "time": "8:58:12", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0004027248406257354, "loss": 0.601682186126709}}
{"epoch": 4, "time_per_epoch": "0:07:35.183181", "max_epoch": 80, "eta": "9:36:33.921729", "date": "5/7/2024", "time": "9:5:48", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00045935974116685435, "loss": 0.7014744877815247}}
{"epoch": 5, "time_per_epoch": "0:07:36.934350", "max_epoch": 80, "eta": "9:31:10.076217", "date": "5/7/2024", "time": "9:13:25", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0005274611582707094, "loss": 0.9280206561088562}}
{"epoch": 6, "time_per_epoch": "0:07:35.909267", "max_epoch": 80, "eta": "9:22:17.285785", "date": "5/7/2024", "time": "9:21:2", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0006063732380625683, "loss": 0.8185919523239136}}
{"epoch": 7, "time_per_epoch": "0:07:34.102590", "max_epoch": 80, "eta": "9:12:29.489096", "date": "5/7/2024", "time": "9:28:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0006953360140763047, "loss": 0.5776488780975342}}
{"epoch": 8, "time_per_epoch": "0:07:31.029943", "max_epoch": 80, "eta": "9:01:14.155902", "date": "5/7/2024", "time": "9:36:8", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0007934927261469067, "loss": 0.7977230548858643}}
{"epoch": 9, "time_per_epoch": "0:07:35.350255", "max_epoch": 80, "eta": "8:58:49.868115", "date": "5/7/2024", "time": "9:43:44", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0008998980714792163, "loss": 0.759836733341217}}
{"epoch": 10, "time_per_epoch": "0:07:34.159572", "max_epoch": 80, "eta": "8:49:51.170060", "date": "5/7/2024", "time": "9:51:19", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0010135273084306063, "loss": 0.5445941686630249}}
{"epoch": 11, "time_per_epoch": "0:07:35.378166", "max_epoch": 80, "eta": "8:43:41.093459", "date": "5/7/2024", "time": "9:58:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0011332861253331745, "loss": 0.6107680201530457}}
{"epoch": 12, "time_per_epoch": "0:07:29.786740", "max_epoch": 80, "eta": "8:29:45.498304", "date": "5/7/2024", "time": "10:6:25", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0012580211793133203, "loss": 0.5008313059806824}}
{"epoch": 13, "time_per_epoch": "0:07:34.622371", "max_epoch": 80, "eta": "8:27:39.698833", "date": "5/7/2024", "time": "10:14:0", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001386531203614099, "loss": 0.6656970977783203}}
{"epoch": 14, "time_per_epoch": "0:07:30.476321", "max_epoch": 80, "eta": "8:15:31.437162", "date": "5/7/2024", "time": "10:21:31", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0015175785764507683, "loss": 0.7663007974624634}}
{"epoch": 15, "time_per_epoch": "0:07:31.792888", "max_epoch": 80, "eta": "8:09:26.537725", "date": "5/7/2024", "time": "10:29:4", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0016499012399851304, "loss": 0.46346515417099}}
{"epoch": 16, "time_per_epoch": "0:07:33.283206", "max_epoch": 80, "eta": "8:03:30.125179", "date": "5/7/2024", "time": "10:36:38", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0017822248546324234, "loss": 0.3259502351284027}}
{"epoch": 17, "time_per_epoch": "0:07:24.865216", "max_epoch": 80, "eta": "7:47:06.508636", "date": "5/7/2024", "time": "10:44:3", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001913275071648148, "loss": 0.6114525198936462}}
{"epoch": 18, "time_per_epoch": "0:07:22.195047", "max_epoch": 80, "eta": "7:36:56.092898", "date": "5/7/2024", "time": "10:51:26", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002041789805803107, "loss": 0.5467955470085144}}
{"epoch": 19, "time_per_epoch": "0:07:24.841465", "max_epoch": 80, "eta": "7:32:15.329358", "date": "5/7/2024", "time": "10:58:51", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0021665313899540883, "loss": 0.6311314702033997}}
{"epoch": 20, "time_per_epoch": "0:07:23.714800", "max_epoch": 80, "eta": "7:23:42.887972", "date": "5/7/2024", "time": "11:6:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0022862984944550316, "loss": 0.4225645959377289}}
{"epoch": 21, "time_per_epoch": "0:07:25.572585", "max_epoch": 80, "eta": "7:18:08.782537", "date": "5/7/2024", "time": "11:13:42", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002399937696618234, "loss": 0.4868691563606262}}
{"epoch": 22, "time_per_epoch": "0:07:24.633668", "max_epoch": 80, "eta": "7:09:48.752744", "date": "5/7/2024", "time": "11:21:7", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0025063545888053566, "loss": 0.6387386322021484}}
{"epoch": 23, "time_per_epoch": "0:07:24.312719", "max_epoch": 80, "eta": "7:02:05.824984", "date": "5/7/2024", "time": "11:28:32", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0026045243181712467, "loss": 0.5239708423614502}}
{"epoch": 24, "time_per_epoch": "0:07:25.113793", "max_epoch": 80, "eta": "6:55:26.372427", "date": "5/7/2024", "time": "11:35:58", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0026935014565570774, "loss": 0.407818466424942}}
{"epoch": 25, "time_per_epoch": "0:07:23.972846", "max_epoch": 80, "eta": "6:46:58.506531", "date": "5/7/2024", "time": "11:43:22", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002772429105480342, "loss": 0.4710182845592499}}
{"epoch": 26, "time_per_epoch": "0:07:22.670298", "max_epoch": 80, "eta": "6:38:24.196090", "date": "5/7/2024", "time": "11:50:45", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0028405471485356687, "loss": 0.6307605504989624}}
{"epoch": 27, "time_per_epoch": "0:07:24.125681", "max_epoch": 80, "eta": "6:32:18.661090", "date": "5/7/2024", "time": "11:58:10", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0028971995717313234, "loss": 0.8641102313995361}}
{"epoch": 28, "time_per_epoch": "0:07:20.112568", "max_epoch": 80, "eta": "6:21:25.853554", "date": "5/7/2024", "time": "12:5:31", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002941840781262567, "loss": 0.6356572508811951}}
{"epoch": 29, "time_per_epoch": "0:07:22.350795", "max_epoch": 80, "eta": "6:15:59.890523", "date": "5/7/2024", "time": "12:12:54", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002974040857878247, "loss": 0.7764367461204529}}
{"epoch": 30, "time_per_epoch": "0:07:21.374408", "max_epoch": 80, "eta": "6:07:48.720419", "date": "5/7/2024", "time": "12:20:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002993489697238202, "loss": 0.4349867105484009}}
{"epoch": 31, "time_per_epoch": "0:07:29.639962", "max_epoch": 80, "eta": "6:07:12.358129", "date": "5/7/2024", "time": "12:27:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029999999963875776, "loss": 0.441621333360672}}
{"epoch": 32, "time_per_epoch": "0:07:31.809968", "max_epoch": 80, "eta": "6:01:26.878446", "date": "5/7/2024", "time": "12:35:18", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029967931997491124, "loss": 0.5041215419769287}}
{"epoch": 33, "time_per_epoch": "0:07:36.250737", "max_epoch": 80, "eta": "5:57:23.784623", "date": "5/7/2024", "time": "12:42:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002987176967241224, "loss": 0.40223047137260437}}
{"epoch": 34, "time_per_epoch": "0:07:29.651408", "max_epoch": 80, "eta": "5:44:43.964753", "date": "5/7/2024", "time": "12:50:25", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029711924788763493, "loss": 0.5588961839675903}}
{"epoch": 35, "time_per_epoch": "0:07:27.479540", "max_epoch": 80, "eta": "5:35:36.579287", "date": "5/7/2024", "time": "12:57:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029489081826876507, "loss": 0.36693549156188965}}
{"epoch": 36, "time_per_epoch": "0:07:28.327920", "max_epoch": 80, "eta": "5:28:46.428481", "date": "5/7/2024", "time": "13:5:22", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0029204195034525557, "loss": 0.9166396260261536}}
{"epoch": 37, "time_per_epoch": "0:07:29.150868", "max_epoch": 80, "eta": "5:21:53.487321", "date": "5/7/2024", "time": "13:12:52", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00288584843406921, "loss": 0.7163593173027039}}
{"epoch": 38, "time_per_epoch": "0:07:37.383134", "max_epoch": 80, "eta": "5:20:10.091611", "date": "5/7/2024", "time": "13:20:30", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002845343013164161, "loss": 0.5028563737869263}}
{"epoch": 39, "time_per_epoch": "0:07:35.479606", "max_epoch": 80, "eta": "5:11:14.663833", "date": "5/7/2024", "time": "13:28:6", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0027990766911682287, "loss": 0.5387132167816162}}
{"epoch": 40, "time_per_epoch": "0:07:31.724850", "max_epoch": 80, "eta": "5:01:08.994012", "date": "5/7/2024", "time": "13:35:38", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002747247587575135, "loss": 0.4849264919757843}}
{"epoch": 41, "time_per_epoch": "0:07:30.573824", "max_epoch": 80, "eta": "4:52:52.379118", "date": "5/7/2024", "time": "13:43:10", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002690077642563414, "loss": 0.5199539661407471}}
{"epoch": 42, "time_per_epoch": "0:07:36.183149", "max_epoch": 80, "eta": "4:48:54.959673", "date": "5/7/2024", "time": "13:50:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002627811666614496, "loss": 0.4742547273635864}}
{"epoch": 43, "time_per_epoch": "0:07:23.678617", "max_epoch": 80, "eta": "4:33:36.108835", "date": "5/7/2024", "time": "13:58:11", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002560716292196641, "loss": 0.3814888298511505}}
{"epoch": 44, "time_per_epoch": "0:07:21.946651", "max_epoch": 80, "eta": "4:25:10.079438", "date": "5/7/2024", "time": "14:5:33", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.002489078832003774, "loss": 0.34325650334358215}}
{"epoch": 45, "time_per_epoch": "0:07:21.781962", "max_epoch": 80, "eta": "4:17:42.368685", "date": "5/7/2024", "time": "14:12:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0024132060486384255, "loss": 0.5689999461174011}}
{"epoch": 46, "time_per_epoch": "0:07:30.936750", "max_epoch": 80, "eta": "4:15:31.849487", "date": "5/7/2024", "time": "14:20:27", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0023334228410071675, "loss": 0.4784387946128845}}
{"epoch": 47, "time_per_epoch": "0:07:25.671313", "max_epoch": 80, "eta": "4:05:07.153333", "date": "5/7/2024", "time": "14:27:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0022500708530536163, "loss": 0.589332103729248}}
{"epoch": 48, "time_per_epoch": "0:07:22.110762", "max_epoch": 80, "eta": "3:55:47.544388", "date": "5/7/2024", "time": "14:35:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0021635070107866206, "loss": 0.27218320965766907}}
{"epoch": 49, "time_per_epoch": "0:07:19.994904", "max_epoch": 80, "eta": "3:47:19.842036", "date": "5/7/2024", "time": "14:42:37", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00207410199386829, "loss": 0.48348644375801086}}
{"epoch": 50, "time_per_epoch": "0:07:24.929756", "max_epoch": 80, "eta": "3:42:27.892692", "date": "5/7/2024", "time": "14:50:2", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0019822386483067766, "loss": 0.37582147121429443}}
{"epoch": 51, "time_per_epoch": "0:07:20.617776", "max_epoch": 80, "eta": "3:32:57.915507", "date": "5/7/2024", "time": "14:57:23", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0018883103470508924, "loss": 0.39562439918518066}}
{"epoch": 52, "time_per_epoch": "0:07:29.628477", "max_epoch": 80, "eta": "3:29:49.597356", "date": "5/7/2024", "time": "15:4:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00179271930550675, "loss": 0.4453161954879761}}
{"epoch": 53, "time_per_epoch": "0:07:24.416505", "max_epoch": 80, "eta": "3:19:59.245644", "date": "5/7/2024", "time": "15:12:18", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0016958748591896452, "loss": 0.6267533302307129}}
{"epoch": 54, "time_per_epoch": "0:07:34.661344", "max_epoch": 80, "eta": "3:17:01.194940", "date": "5/7/2024", "time": "15:19:54", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0015981917108865375, "loss": 0.23352468013763428}}
{"epoch": 55, "time_per_epoch": "0:07:25.929298", "max_epoch": 80, "eta": "3:05:48.232458", "date": "5/7/2024", "time": "15:27:20", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001500088154835051, "loss": 0.5205101370811462}}
{"epoch": 56, "time_per_epoch": "0:07:24.857243", "max_epoch": 80, "eta": "2:57:56.573843", "date": "5/7/2024", "time": "15:34:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00140198428552333, "loss": 0.3533079922199249}}
{"epoch": 57, "time_per_epoch": "0:07:36.734620", "max_epoch": 80, "eta": "2:55:04.896266", "date": "5/7/2024", "time": "15:42:23", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0013043001987809468, "loss": 0.5288237929344177}}
{"epoch": 58, "time_per_epoch": "0:07:22.240888", "max_epoch": 80, "eta": "2:42:09.299538", "date": "5/7/2024", "time": "15:49:46", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0012074541928640665, "loss": 0.5628782510757446}}
{"epoch": 59, "time_per_epoch": "0:07:21.112908", "max_epoch": 80, "eta": "2:34:23.371059", "date": "5/7/2024", "time": "15:57:8", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.001111860977238095, "loss": 0.31243863701820374}}
{"epoch": 60, "time_per_epoch": "0:07:21.676104", "max_epoch": 80, "eta": "2:27:13.522089", "date": "5/7/2024", "time": "16:4:30", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0010179298967280793, "loss": 0.24063703417778015}}
{"epoch": 61, "time_per_epoch": "0:07:22.304313", "max_epoch": 80, "eta": "2:20:03.781941", "date": "5/7/2024", "time": "16:11:53", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0009260631786413252, "loss": 0.4758918583393097}}
{"epoch": 62, "time_per_epoch": "0:07:20.590280", "max_epoch": 80, "eta": "2:12:10.625033", "date": "5/7/2024", "time": "16:19:14", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0008366542103683161, "loss": 0.48947787284851074}}
{"epoch": 63, "time_per_epoch": "0:07:20.320319", "max_epoch": 80, "eta": "2:04:45.445420", "date": "5/7/2024", "time": "16:26:35", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0007500858548375109, "loss": 0.38016995787620544}}
{"epoch": 64, "time_per_epoch": "0:07:19.587276", "max_epoch": 80, "eta": "1:57:13.396410", "date": "5/7/2024", "time": "16:33:55", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0006667288110375084, "loss": 0.39876535534858704}}
{"epoch": 65, "time_per_epoch": "0:07:19.526198", "max_epoch": 80, "eta": "1:49:52.892974", "date": "5/7/2024", "time": "16:41:15", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0005869400266270666, "loss": 0.4667430520057678}}
{"epoch": 66, "time_per_epoch": "0:07:20.005884", "max_epoch": 80, "eta": "1:42:40.082382", "date": "5/7/2024", "time": "16:48:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0005110611694304271, "loss": 0.29991692304611206}}
{"epoch": 67, "time_per_epoch": "0:07:19.748235", "max_epoch": 80, "eta": "1:35:16.727050", "date": "5/7/2024", "time": "16:55:56", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0004394171643632412, "loss": 0.341851145029068}}
{"epoch": 68, "time_per_epoch": "0:07:19.018176", "max_epoch": 80, "eta": "1:27:48.218115", "date": "5/7/2024", "time": "17:3:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.000372314802054194, "loss": 0.3857216238975525}}
{"epoch": 69, "time_per_epoch": "0:07:19.552546", "max_epoch": 80, "eta": "1:20:35.078006", "date": "5/7/2024", "time": "17:10:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0003100414251204348, "loss": 0.31028807163238525}}
{"epoch": 70, "time_per_epoch": "0:07:20.258756", "max_epoch": 80, "eta": "1:13:22.587557", "date": "5/7/2024", "time": "17:17:57", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0002528636977223765, "loss": 0.4571707844734192}}
{"epoch": 71, "time_per_epoch": "0:07:18.766678", "max_epoch": 80, "eta": "1:05:48.900103", "date": "5/7/2024", "time": "17:25:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00020102646366682223, "loss": 0.4598176181316376}}
{"epoch": 72, "time_per_epoch": "0:07:19.548373", "max_epoch": 80, "eta": "0:58:36.386986", "date": "5/7/2024", "time": "17:32:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.0001547516979481945, "loss": 0.2587343454360962}}
{"epoch": 73, "time_per_epoch": "0:07:19.405268", "max_epoch": 80, "eta": "0:51:15.836877", "date": "5/7/2024", "time": "17:39:56", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 0.00011423755621753237, "loss": 0.8911759853363037}}
{"epoch": 74, "time_per_epoch": "0:07:19.128135", "max_epoch": 80, "eta": "0:43:54.768811", "date": "5/7/2024", "time": "17:47:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 7.965752624957046e-05, "loss": 0.45557257533073425}}
{"epoch": 75, "time_per_epoch": "0:07:19.406747", "max_epoch": 80, "eta": "0:36:37.033734", "date": "5/7/2024", "time": "17:54:36", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 5.1159685041454155e-05, "loss": 0.26564571261405945}}
{"epoch": 76, "time_per_epoch": "0:07:19.508782", "max_epoch": 80, "eta": "0:29:18.035129", "date": "5/7/2024", "time": "18:1:56", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 2.8866064724304957e-05, "loss": 0.48253774642944336}}
{"epoch": 77, "time_per_epoch": "0:07:20.132161", "max_epoch": 80, "eta": "0:22:00.396483", "date": "5/7/2024", "time": "18:9:16", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 1.2872130002899723e-05, "loss": 0.6122196912765503}}
{"epoch": 78, "time_per_epoch": "0:07:20.184826", "max_epoch": 80, "eta": "0:14:40.369652", "date": "5/7/2024", "time": "18:16:37", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 3.2463693611489186e-06, "loss": 0.1481819897890091}}
{"epoch": 79, "time_per_epoch": "0:07:18.947498", "max_epoch": 80, "eta": "0:07:18.947498", "date": "5/7/2024", "time": "18:23:57", "status": "RUNNING", "verbosity": "INFO", "message": "Train metrics generated.", "kpi": {"learning_rate": 3.0001783894512056e-08, "loss": 0.2755662798881531}}
{"date": "5/7/2024", "time": "18:23:57", "status": "SUCCESS", "verbosity": "INFO", "message": "Training finished successfully.", "kpi": {"learning_rate": 3.0001783894512056e-08, "loss": 0.2755662798881531}}

This is the output model: checkpoint_epoch_80.tlt

Performance of this model on with evaluation.py script on my validation set is:

Average predicted number of objects(1318 samples): 6.969

Car AP@0.50, 0.50:
bev  AP:83.1364
3d   AP:78.0459
bev mAP: 83.1364
3d mAP: 78.0459

Problem: conversion to tensorrt

Note: for conversion max_points_num in inference section of pointpillar_general.yaml was set to max_points_num: 204800.

I tried to convert the model into tensorrt engine in two ways.

Method 1: export script

First I tried to use the provided export.py script.

Question: why do we set dummy_voxel_num_points and dummy_coords as torch.int32 and we don’t keep them as float?

python nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py --cfg_file nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml --save_engine path/to/output/checkpoint_epoch_80.engine  --key tlt_encode

I obtain an engine checkpoint_epoch_80.engine

However all metrics drop to zero.

Evaluation with checkpoint_epoch_80.engine

Average predicted number of objects(1318 samples): 1.174

2024-05-16 12:47:50,937   INFO  Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Method 2: trtexec

I tried to take the onnx generated by export script (checkpoint_epoch_80.onnx the one after simplification with graph surgeon) and I tried to generate the engine with trtexeccheckpoint_epoch_80_trtexec.engine

trtexec --onnx=/path/to/checkpoint_epoch_80.onnx \
        --maxShapes=points:1x204800x4,num_points:1 \
        --minShapes=points:1x204800x4,num_points:1 \
        --optShapes=points:1x204800x4,num_points:1 \
        --fp16 \
        --saveEngine=/path/to/checkpoint_epoch_80_trtexec.engine

However the metrics are still zero:

Average predicted number of objects(1318 samples): 1.169

Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Experiments

I tried to:

  • remove --fp16 flag when using trtexec
  • use different docker as suggested in other topics
  • convert the non-simplified onnx with trtexec → this kind of worked but it was outputting nans. To overcome this problem I used the --best flag during conversion. However the prediction are still wrong.

Please refer to The effect is very poor when converted to trt - #62 by Morganh to use TRT8.6.1 and retry.

I’m already using TRT8.6.1

pip list | grep tensorrt
tensorrt                    8.6.1
torch-tensorrt              2.2.0a0

Please try to follow the end of following link .
From PointPillars - NVIDIA Docs,
A TensorRT sample is developed as a demo to show how to deploy PointPillars models trained in TAO Toolkit.

Is there a docker already containing TensorRT 8.2(or above) and TensorRT OSS 22.02?

In addition, do you confirm the export script doesn’t work correctly for exporting pointpillar?

Please directly use the tao-deploy docker(nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy) . It contains TensorRT8.6.1.6. The docker file is in tao_deploy/docker/Dockerfile at main · NVIDIA/tao_deploy · GitHub. Run in the docker, then use trtexec to generate tensorrt engine, and do the last two steps( Clone the repo and Run the TensorRT Inference )

It doesn’t work with this docker either.

Steps I performed:

  1. run docker
docker run -it --rm --gpus all -e PYTHONPATH=/tao-pt:$PYTHONPATH -e PYTHONPATH=/tao-pt  --shm-size 16G   --net=host nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy
  1. export onnx to trt
trtexec --onnx=checkpoint_epoch_80.onnx \
        --maxShapes=points:1x204800x4,num_points:1 \
        --minShapes=points:1x204800x4,num_points:1 \
        --optShapes=points:1x204800x4,num_points:1 \
        --fp16 \
        --saveEngine=checkpoint_epoch_80.engine
  1. Clone the repo
git clone https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes.git
  1. follow last two steps
cd tao_toolkit_recipes/tao_pointpillars/tensorrt_sample/test
mkdir build
cd build
cmake .. -DCUDA_VERSION=11.8
sudo apt-get install libboost-all-dev # needed to be able to compile
make -j4
  1. run inference
./pointpillars -e path/to/checkpoint_epoch_80.engine -l /path/to/mybin.bin -t 0.01 -c Vehicle -n 4096 -p -d fp16 

output

TIME: doinfer: 99.3167 ms.
Vehicle, 59.085323, 16.726288, -1.349629, 4.166372, 1.631799, 1.426950, 4.644820, 0.227298
TIME: pointpillar: 99.7745 ms.
Bndbox objs: 1
Saved prediction in: mybin.txt

mybin.txt

59.0853 16.7263 -1.34963 1.6318 4.16637 1.42695 4.64482 0 0.227298 

Which doesn’t match at all the expected output as the output with the original .tlt model is:

Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.5022 1.6059 3.5395 13.5955 -2.8459 -0.8949 6.3548 0.9109
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.6122 1.6827 4.0306 12.6629 3.2928 -0.8674 6.3130 0.9073
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.5411 1.6050 3.3492 8.9439 -3.2063 -0.8594 6.2721 0.8128
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4886 1.5999 3.7508 2.7487 -3.1261 -0.9196 6.2323 0.7272
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.5183 1.6585 3.6357 26.6466 -2.6679 -0.8169 6.2843 0.3774
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4886 1.6189 3.9054 69.1245 -0.7873 -0.5102 3.1647 0.2728
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4437 1.6175 3.8565 -21.7086 6.8582 -1.2460 4.7063 0.2456
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4267 1.6319 4.1663 -23.6664 -7.7901 -1.3491 4.6448 0.2259
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.7588 1.8220 4.5767 52.2319 -1.9124 -0.4946 6.2961 0.1866
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4859 1.6786 4.1968 -43.0405 -5.9368 -1.3399 6.7888 0.1848
Car -1 -1 0.0000 0.0000 0.0000 0.0000 0.0000 1.4939 1.6477 4.0340 70.1843 5.6645 -1.2091 3.1312 0.1006

To narrow down, can you go back to tao-pyt docker and run evaluation or inference against tensort engine?
See

https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html#evaluating-the-model and https://docs.nvidia.com/tao/tao-toolkit/text/point_cloud/pointpillars.html#running-inference-on-the-pointpillars-model.

If you mean with this new engine generated in the new docker the results are still the same (running evaluation in nvcr.io/nvidia/tao/tao-toolkit:5.0.0-pyt-base docker) :

2024-05-16 17:06:01,609   INFO  **********************Start logging**********************
2024-05-16 17:06:01,609   INFO  CUDA_VISIBLE_DEVICES=ALL
2024-05-16 17:06:01,765   INFO  Loading point cloud dataset
2024-05-16 17:06:01,778   INFO  Total samples for point cloud dataset: 1318
2024-05-16 17:06:02,234   INFO  *************** EVALUATION *****************
2024-05-16 17:06:34,291   INFO  *************** Performance *****************
2024-05-16 17:06:34,292   INFO  Generate label finished(sec_per_example: 0.0243 second).
2024-05-16 17:06:34,292   INFO  recall_roi_0.3: 0.000000
2024-05-16 17:06:34,292   INFO  recall_rcnn_0.3: 0.000000
2024-05-16 17:06:34,292   INFO  recall_roi_0.5: 0.000000
2024-05-16 17:06:34,292   INFO  recall_rcnn_0.5: 0.000000
2024-05-16 17:06:34,292   INFO  recall_roi_0.7: 0.000000
2024-05-16 17:06:34,292   INFO  recall_rcnn_0.7: 0.000000
2024-05-16 17:06:34,292   INFO  Average predicted number of objects(1318 samples): 1.168
2024-05-16 17:06:34,500   INFO  Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

If you mean my original engine you can see it in my original question.

Please use the official model to run.
You can run as below.
$ dcoker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt /bin/bash
Then inside the docker,
Follow PointPillars - NVIDIA Docs, but without “tao model” in the beginning of command line.
# pointpillars export xxx
# pointpillars evaluation xxx
# pointpillars inference xxx

I used the docker 5.3.0-pyt:

docker run -it --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --runtime=nvidia nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt /bin/bash

I downloaded the official pointpillars_trainable.tlt model from here

Since this model was trained with all the 3 classes I restored all 3 classes related arguments in the configuration file.
Performance of pointpillars_trainable.tlt on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 1.219
Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Pedestrian AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Cyclist AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Performance of pointpillars_trainable.engine on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 11.292
Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Pedestrian AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
Cyclist AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

Since all metrics were zero even with the tlt model it couldn’t be very informative of the success of the export itself. For this reason I also tried with my model trained on my version of KITTI: checkpoint_kitti_80.tlt.

Performance of checkpoint_kitti_80.tlt on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 17.327
Car AP@0.50, 0.50:
bev  AP:81.6425
3d   AP:80.0362
bev mAP: 81.6425
3d mAP: 80.0362

Performance of checkpoint_kitti_80.engine on my version of KITTI dataset (only cars are annotated):

Average predicted number of objects(1318 samples): 1.170
Car AP@0.50, 0.50:
bev  AP:0.0000
3d   AP:0.0000
bev mAP: 0.0000
3d mAP: 0.0000

May I know that if you are running with default notebook and the default dataset mentioned in it?

I managed to make the tensorrt work however I still have some doubts about final output metrics. By running the evaluation on the exact same val set multiple times I obtain quite different metrics.

FP16 trt
Car AP@0.50, 0.50:
bev AP:80.8023
3d AP:64.2504

bev AP:78.1450
3d AP:64.4269

bev AP:80.5636
3d AP:64.1166

bev AP:80.6485
3d AP:64.0952

bev AP:79.1487
3d AP:63.4226

FP32 trt
Car AP@0.50, 0.50:
bev AP:80.4541
3d AP:63.9492

bev AP:80.8236
3d AP:66.5927

bev AP:80.8545
3d AP:64.2770

bev AP:78.5316
3d AP:63.9503

bev AP:80.2531
3d AP:64.0063

Is this behavior normal even if sometimes difference is > 1.5 between one evaluation and another?

May I know what changes did you do to make tensorrt engine work?

How about inference to check the visualize images? Is it consistent?

In the training I did some changes in the configuration file, including the removal of cyclist and pedestrian related arguments. However in the anchor generation config I left them there by mistake. I didn’t think this inconsistency could impact the tensorrt as the tlt model was working fine but apparently this was the problem.

The inference is consistent but a deviation of 1.5 is hard to spot visually I guess. However, I suppose I shouldn’t see these oscillations in metrics especially with fp32.

Could you run evaluation against very small part of dataset(even 1) to check if it is consistent?

What do you mean? I was referring to the modification in the configuration file where I left cyclist and pedestrian arguments in the anchor generation config. The fact that the anchor generator still had those two arguments but just ‘car’ in the class definition messed up the tensorrt performance. now I already retrained the model removing those as well. Thanks to that now I have good performances with tensorrt as well.

However, now my question is different: why do I obtain oscillating metrics upon multiple evaluation on the same val set?

I also found some (rare) inconsistencies visually by running inference multiple times on the same val set:

Please note that differently from trt, with tlt model I always obtain the exact same metrics and outputs

Yes, what I mean is also about this question. Just want to use less dataset to narrow down.
For the visualization result, it is really odd. The same command and same val set?

Yes the exact same command and same val set.

By narrowing down to one single val sample (the one I showed before that was different across 2 different evaluation runs) and running multiple inferences and multiple evaluations the results seem always the same but I’m not sure this is by chance…

This is my current config file pointpillar_general.yaml.

This the output when I generated the engine

FP16

pointpillars export -r ****** -e nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml -k tlt_encode --save_engine ****** --data_type fp16
python /usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py --cfg_file nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml --key tlt_encode --save_engine ****** --data_type fp16


INFO: Exporting the model...
INFO: Starting PointPillars export
WARNING: 'decrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.decrypt_stream()' instead.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:64: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  inputs_shape = inputs.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:80: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x_max_shape = x_max.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:132: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = coords[..., 0].max().int().item() + 1
/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
WARNING: 'encrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.encrypt_stream()' instead.
INFO: Model exported to ******
INFO: Model exported to ******
[05/20/2024-15:40:03] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/20/2024-15:40:30] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/20/2024-15:40:30] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/20/2024-15:40:30] [TRT] [W] Check verbose logs for the list of affected weights.
[05/20/2024-15:40:30] [TRT] [W] - 21 weights are affected by this issue: Detected subnormal FP16 values.
[05/20/2024-15:40:30] [TRT] [W] - 10 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
INFO: TensorRT engine saved to ******
INFO: TensorRT engine saved to ******
INFO: Export finished successfully. 

FP32

pointpillars export -r ****** -e nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml -k tlt_encode --save_engine ******
python /usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py --cfg_file nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml --key tlt_encode --save_engine ****** --data_type fp32


INFO: Exporting the model...
INFO: Starting PointPillars export
WARNING: 'decrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.decrypt_stream()' instead.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:64: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  inputs_shape = inputs.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:80: TracerWarning: Converting a tensor to a NumPy array might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  x_max_shape = x_max.cpu().detach().numpy().shape
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/pointcloud/pointpillars/scripts/export.py:132: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = coords[..., 0].max().int().item() + 1
/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:1209: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:179.)
  _C._jit_pass_onnx_graph_shape_type_inference(
WARNING: 'encrypt_stream' is deprecated, to be removed in '0.7'. Please use 'eff.codec.encrypt_stream()' instead.
INFO: Model exported to ******
INFO: Model exported to ******
[05/20/2024-15:38:45] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
INFO: TensorRT engine saved to ******
INFO: TensorRT engine saved to ******
INFO: Export finished successfully.

The results I showed you so far, if not specified differently, were related to the engine exported with fp32