Creating tlt-int8-tensorfile in TLT

I have 9893 training images and created TFrecord with 14% for validation data. So I have more than 8000 train images.
When I create tensorfile, ‘maximum number of batches’ was set 1024. But there is error as ‘ValueError: The dataset contains 556 minibatches, while the requested amount is 1024.’.
Why I can’t set 1024?

Could you please paste the command and full log when you run “tlt-int8-tensorfile”?

For example, if you set bs=4, and you have 8000 training images. Then the dataset contains 2000(8000/4) mimibatches. You can not set “-m” which is larger than 2000.

So, please check the bs in your training spec.

My full command is as follows.

tlt-export detectnet_v2 -m /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/pruned_models/weights/resnet_18retrain.tlt -o /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/pruned_models/weights/resnet18_int8.etlt -k NHRvZzAwbHFncTk0MXJ0YmwwbXB1bGxhbnU6MjYzNzc2MDctYzQ5MC00NjkxLThkODAtODM0NDc3ZTRhNTNh --cal_data_file /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/pruned_models/weights/calibration.tensor --data_type int8 --batches 556 --max_batch_size 8 --max_workspace_size 1073741824 --cal_cache_file /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/pruned_models/weights/calibration_cache_int8.bin --engine_file /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/pruned_models/weights/resnet_18trt_int8.engine

–batches is 556 and default calibration batch size is 8 so now 556*8=4448 images are in calibration.
Since I have 9893 training images, --batches is set 1024. So in theory it should be fine. But I have error if I set --batches is 1024. The error is

‘ValueError: The dataset contains 556 minibatches, while the requested amount is 1024.’

So the max I can set --batches is 556. What could be wrong?

Please check how many total images, training images and validation images. Can you check your full training log? It will show the training images and val images.
Also, please paste the command and full log when you run “tlt-int8-tensorfile”.

When the tlt-int8-tensorfile command is run with -m 1024, I have error as follows.

tlt-int8-tensorfile detectnet_v2 -e /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/detectnet_v2_prune_resnet18_kitti.txt -o /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/pruned_models/weights/calibration.tensor -m 1024 
Using TensorFlow backend.
2020-05-26 02:01:15,309 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/detectnet_v2_prune_resnet18_kitti.txt
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
Traceback (most recent call last):
  File "/usr/local/bin/tlt-int8-tensorfile", line 8, in <module>
    sys.exit(main())
  File "./common/magnet_tensorfile.py", line 31, in main
  File "<decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/calibration_tensorfile.py", line 162, in main
  File "./detectnet_v2/scripts/calibration_tensorfile.py", line 59, in dump_dataset_images_to_tensorfile
ValueError: The dataset contains 556 minibatches, while the requested amount is 1024.

I can set only -m is 566. Then it works.
tlt-int8-tensorfile detectnet_v2 -e /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/detectnet_v2_prune_resnet18_kitti.txt -o /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/pruned_models/weights/calibration.tensor -m 556
Using TensorFlow backend.
2020-05-26 02:01:42,466 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/detectnet_v2_prune_resnet18_kitti.txt
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
Writing calibration tensorfile: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 556/556 [18:55<00:00, 2.04s/it]
Time taken to run iva.detectnet_v2.scripts.calibration_tensorfile:main: 0:19:01.263457.

Could you please paste you full of log during retraining?
And also paste the spec too.Thanks.
/workspace/tlt-experiments/detectnet_v2/resnet18/prune_0.5/detectnet_v2_prune_resnet18_kitti.txt

I can’t find training log. The spec file is attached.
detectnet_v2_prune_resnet18_kitti.log (4.1 KB)

From the training log, what should i observe? I can take note for next time.

From the log, you can check the training images quantity.
For example,

2020-05-25 12:48:51,891 [INFO] iva.detectnet_v2.scripts.train: Found 6434 samples in training set
2020-05-25 12:48:57,771 [INFO] iva.detectnet_v2.scripts.train: Found 1047 samples in validation set

Then,
If I set bs=4 in the spec, then minibatches = int(6434/4) = 1608.
If I set bs=40 in the spec, then minibatches = int(6434/40) = 160

I already confirm this by some experiments. In the tlt-int8-tensorfile command, we cannot set the “-m” which is larger than minibatches.

minibatches means --batches ?
In the argument list, I can’t find minibatches.
The folloing are all arguments for tlt-export.

-m <path to the .tlt model file generated by tlt train>
-k <key>
[-o <path to output file>]
[--cal_data_file <path to tensor file>]
[--cal_image_dir <path to the directory images to calibrate the
model]
[--cal_cache_file <path to output calibration file>]
[--data_type <Data type for the TensorRT backend during export>]
[--batches <Number of batches to calibrate over>]
[--max_batch_size <maximum trt batch size>]
[--max_workspace_size <maximum workspace size]
[--batch_size <batch size to TensorRT engine>]
[--experiment_spec <path to experiment spec file>]
[--engine_file <path to the TensorRT engine file>]
[--verbose Verbosity of the logger]

Please focus on “tlt-int8-tensorfile” command firstly. From your previous log,

ValueError: The dataset contains 556 **minibatches**, while the requested amount is 1024.

Please fix this error firstly. The minibatches = training_images / bs_in_the_spec

So, please check why you get 556. Suggest you check the training images quantity.

Yeah good point. Now I understand. I have more than 8000 training images. In the training spec, batch size for training is 16. So minibatches size is 556. That is why I can’t set more than minibatches size, which is currently 556. Thanks I understood. Then I should be able to calibrate with --calibrate batch size 16. Thanks