It may be that all is needed is to delete the calibration cache so am currently looking into doing that.
Please provide the following information when requesting support.
• Quad A30
• RetinaNet with ResNet18 . Also RetinaNet with EfficientNet B0
• Configuration of the TAO Toolkit Instance:
dockers:
nvidia/tao/tao-toolkit-tf:
v3.22.05-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. faster_rcnn
6. emotionnet
7. efficientdet
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. yolo_v4_tiny
21. converter
v3.22.05-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
nvidia/tao/tao-toolkit-pyt:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. speech_to_text_conformer
4. action_recognition
5. pointpillars
6. pose_classification
7. spectro_gen
8. vocoder
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. text_classification
2. question_answering
3. token_classification
4. intent_slot_classification
5. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022
• Training spec file(If have, please share here)
random_seed: 42
retinanet_config {
aspect_ratios_global: "[1.0, 2.0, 0.5]"
scales: "[0.045, 0.09, 0.2, 0.4, 0.55, 0.7]"
two_boxes_for_ar1: false
clip_boxes: false
loss_loc_weight: 0.8
focal_loss_alpha: 0.25
focal_loss_gamma: 2.0
variances: "[0.1, 0.1, 0.2, 0.2]"
arch: "resnet"
nlayers: 18
n_kernels: 1
n_anchor_levels: 1
feature_size: 256
freeze_bn: False
freeze_blocks: 0
}
training_config {
enable_qat: False
pretrain_model_path: "YOUR_PRETRAINED_MODEL"
batch_size_per_gpu: 8
num_epochs: 100
n_workers: 2
checkpoint_interval: 10
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 4e-5
max_learning_rate: 1.5e-2
soft_start: 0.1
annealing: 0.3
}
}
regularizer {
type: L1
weight: 2e-5
}
optimizer {
sgd {
momentum: 0.9
nesterov: True
}
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 8
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
output_width: 1248
output_height: 384
output_channel: 3
}
dataset_config {
data_sources: {
tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_train*"
}
target_class_mapping {
key: "car"
value: "car"
}
target_class_mapping {
key: "pedestrian"
value: "pedestrian"
}
target_class_mapping {
key: "cyclist"
value: "cyclist"
}
target_class_mapping {
key: "van"
value: "car"
}
target_class_mapping {
key: "person_sitting"
value: "pedestrian"
}
target_class_mapping {
key: "truck"
value: "car"
}
validation_data_sources: {
image_directory_path: "/workspace/tao-experiments/data/val/image"
label_directory_path: "/workspace/tao-experiments/data/val/label"
}
}
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
- Train RetinaNet
- Export INT8 etlt
- Convert INT8 etlt to INT8 engine (Falls back to non-int8)
Export in INT8 mode(generate calibration cache file).
This will generate an .etlt file and an INT8 calibration file.
#replaced batch size 1 with 8
!rm -f $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_${EPOCH}_int8.etlt
!tao retinanet export -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_$EPOCH.tlt \
-o $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_${EPOCH}_int8.etlt \
-e $SPECS_DIR/retinanet_train_resnet18_kitti.txt \
-k $KEY \
--cal_image_dir $DATA_DOWNLOAD_DIR/training/image_2 \
--data_type int8 \
--batch_size 8 \
--batches 1 \
--cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin \
--cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
--gen_ds_config
2022-06-09 23:03:46,244 [INFO] root: Registry: ['nvcr.io']
2022-06-09 23:03:46,305 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
2022-06-10 04:04:27,559 [INFO] iva.retinanet.utils.spec_loader: Merging specification from /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt
2022-06-10 04:04:31,230 [INFO] iva.common.export.keras_exporter: Using input nodes: ['Input']
2022-06-10 04:04:31,230 [INFO] iva.common.export.keras_exporter: Using output nodes: ['NMS']
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting P5_upsampled as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting P4_upsampled as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_4 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_3 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
2022-06-10 04:05:48,415 [INFO] iva.retinanet.export.exporter: Converted model was saved into /workspace/tao-experiments/retinanet/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_002_int8.etlt
1it [00:02, 2.13s/it]
2022-06-10 04:05:50,598 [INFO] iva.common.export.keras_exporter: Calibration takes time especially if number of batches is large.
2022-06-10 04:06:26,889 [INFO] iva.common.export.base_calibrator: Saving calibration cache (size 8263) to /workspace/tao-experiments/retinanet/export/cal.bin
2022-06-09 23:07:35,031 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Now the convert runs into the scaling and zero point problem:
# Convert to TensorRT engine(INT8).
#changed retrained to unpruned
#-m is the batch size maximum
!tao converter -k $KEY \
-d 3,384,1248 \
-o NMS \
-c $USER_EXPERIMENT_DIR/export/cal.bin \
-e $USER_EXPERIMENT_DIR/export/trt.int8.engine \
-b 8 \
-m 8 \
-t int8 \
-i nchw \
$USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_${EPOCH}_int8.etlt
2022-06-09 23:08:28,039 [INFO] root: Registry: ['nvcr.io']
2022-06-09 23:08:28,104 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
[INFO] [MemUsageChange] Init CUDA: CPU +441, GPU +0, now: CPU 452, GPU 449 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 452 MiB, GPU 449 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 669 MiB, GPU 521 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[WARNING] Missing scale and zero-point for tensor conv1/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/moving_variance, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/Reshape_1/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/batchnorm/add/y, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/gamma, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/Reshape_3/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/beta, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
.
.
.
.
[WARNING] Missing scale and zero-point for tensor retinanet_loc_regressor/bias_3, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [WARNING] Missing scale and zero-point for tensor retinanet_loc_subn_0/bias_3, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +809, GPU +350, now: CPU 1697, GPU 871 (MiB) [INFO] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1823, GPU 929 (MiB) [INFO] Local timing cache in use. Profiling results in this builder pass will not be stored. [INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [INFO] Detected 1 inputs and 2 output network tensors. [INFO] Total Host Persistent Memory: 95216 [INFO] Total Device Persistent Memory: 23142400 [INFO] Total Scratch Memory: 26923776 [INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 89 MiB, GPU 864 MiB [INFO] [BlockAssignment] Algorithm ShiftNTopDown took 14.4801ms to assign 14 blocks to 84 nodes requiring 110344193 bytes. [INFO] Total Activation Memory: 110344193 [INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2720, GPU 1365 (MiB) [INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2720, GPU 1375 (MiB) [INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +21, GPU +26, now: CPU 21, GPU 26 (MiB) 2022-06-09 23:10:52,295 [INFO] tlt.components.docker_handler.docker_handler: Stopping container. ```