TAO toolkit fails to convert RetinaNet INT8 etlt model to INT8 CUDA engine (calibration cache needs to be deleted?)

brandt33 · June 10, 2022, 4:12pm

It may be that all is needed is to delete the calibration cache so am currently looking into doing that.

Please provide the following information when requesting support.

• Quad A30
• RetinaNet with ResNet18 . Also RetinaNet with EfficientNet B0
• Configuration of the TAO Toolkit Instance:

dockers:
nvidia/tao/tao-toolkit-tf:
v3.22.05-tf1.15.5-py3:
docker_registry: nvcr.io
tasks:
1. augment
2. bpnet
3. classification
4. dssd
5. faster_rcnn
6. emotionnet
7. efficientdet
8. fpenet
9. gazenet
10. gesturenet
11. heartratenet
12. lprnet
13. mask_rcnn
14. multitask_classification
15. retinanet
16. ssd
17. unet
18. yolo_v3
19. yolo_v4
20. yolo_v4_tiny
21. converter
v3.22.05-tf1.15.4-py3:
docker_registry: nvcr.io
tasks:
1. detectnet_v2
nvidia/tao/tao-toolkit-pyt:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. speech_to_text
2. speech_to_text_citrinet
3. speech_to_text_conformer
4. action_recognition
5. pointpillars
6. pose_classification
7. spectro_gen
8. vocoder
v3.21.11-py3:
docker_registry: nvcr.io
tasks:
1. text_classification
2. question_answering
3. token_classification
4. intent_slot_classification
5. punctuation_and_capitalization
nvidia/tao/tao-toolkit-lm:
v3.22.05-py3:
docker_registry: nvcr.io
tasks:
1. n_gram
format_version: 2.0
toolkit_version: 3.22.05
published_date: 05/25/2022

• Training spec file(If have, please share here)

random_seed: 42
retinanet_config {
  aspect_ratios_global: "[1.0, 2.0, 0.5]"
  scales: "[0.045, 0.09, 0.2, 0.4, 0.55, 0.7]"
  two_boxes_for_ar1: false
  clip_boxes: false
  loss_loc_weight: 0.8
  focal_loss_alpha: 0.25
  focal_loss_gamma: 2.0
  variances: "[0.1, 0.1, 0.2, 0.2]"
  arch: "resnet"
  nlayers: 18
  n_kernels: 1
  n_anchor_levels: 1
  feature_size: 256
  freeze_bn: False
  freeze_blocks: 0
}
training_config {
  enable_qat: False
  pretrain_model_path: "YOUR_PRETRAINED_MODEL"
  batch_size_per_gpu: 8
  num_epochs: 100
  n_workers: 2
  checkpoint_interval: 10
  learning_rate {
    soft_start_annealing_schedule {
      min_learning_rate: 4e-5
      max_learning_rate: 1.5e-2
      soft_start: 0.1
      annealing: 0.3
    }
  }
  regularizer {
    type: L1
    weight: 2e-5
  }
  optimizer {
    sgd {
      momentum: 0.9
      nesterov: True
    }
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  batch_size: 8
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.01
  clustering_iou_threshold: 0.6
  top_k: 200
}
augmentation_config {
    output_width: 1248
    output_height: 384
    output_channel: 3
}
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tao-experiments/data/tfrecords/kitti_train*"
  }
  target_class_mapping {
      key: "car"
      value: "car"
  }
  target_class_mapping {
      key: "pedestrian"
      value: "pedestrian"
  }
  target_class_mapping {
      key: "cyclist"
      value: "cyclist"
  }
  target_class_mapping {
      key: "van"
      value: "car"
  }
  target_class_mapping {
      key: "person_sitting"
      value: "pedestrian"
  }
 target_class_mapping {
      key: "truck"
      value: "car"
  }
   validation_data_sources: {
    image_directory_path: "/workspace/tao-experiments/data/val/image"
    label_directory_path: "/workspace/tao-experiments/data/val/label"
  } 
}

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

Train RetinaNet
Export INT8 etlt
Convert INT8 etlt to INT8 engine (Falls back to non-int8)

Export in INT8 mode(generate calibration cache file).

This will generate an .etlt file and an INT8 calibration file.

#replaced batch size 1 with 8
!rm -f $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_${EPOCH}_int8.etlt
!tao retinanet export -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_$EPOCH.tlt  \
                      -o $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_${EPOCH}_int8.etlt \
                      -e $SPECS_DIR/retinanet_train_resnet18_kitti.txt \
                      -k $KEY \
                      --cal_image_dir $DATA_DOWNLOAD_DIR/training/image_2 \
                      --data_type int8 \
                      --batch_size 8 \
                      --batches 1 \
                      --cal_cache_file $USER_EXPERIMENT_DIR/export/cal.bin  \
                      --cal_data_file $USER_EXPERIMENT_DIR/export/cal.tensorfile \
                      --gen_ds_config

2022-06-09 23:03:46,244 [INFO] root: Registry: ['nvcr.io']
2022-06-09 23:03:46,305 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
2022-06-10 04:04:27,559 [INFO] iva.retinanet.utils.spec_loader: Merging specification from /workspace/tao-experiments/retinanet/specs/retinanet_train_resnet18_kitti.txt
2022-06-10 04:04:31,230 [INFO] iva.common.export.keras_exporter: Using input nodes: ['Input']
2022-06-10 04:04:31,230 [INFO] iva.common.export.keras_exporter: Using output nodes: ['NMS']
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting P5_upsampled as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting P4_upsampled as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_4 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_3 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['NMS'] as outputs
2022-06-10 04:05:48,415 [INFO] iva.retinanet.export.exporter: Converted model was saved into /workspace/tao-experiments/retinanet/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_002_int8.etlt
1it [00:02,  2.13s/it]
2022-06-10 04:05:50,598 [INFO] iva.common.export.keras_exporter: Calibration takes time especially if number of batches is large.
2022-06-10 04:06:26,889 [INFO] iva.common.export.base_calibrator: Saving calibration cache (size 8263) to /workspace/tao-experiments/retinanet/export/cal.bin
2022-06-09 23:07:35,031 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Now the convert runs into the scaling and zero point problem:

# Convert to TensorRT engine(INT8).
#changed retrained to unpruned


#-m is the batch size maximum

!tao converter -k $KEY  \
                   -d 3,384,1248 \
                   -o NMS \
                   -c $USER_EXPERIMENT_DIR/export/cal.bin \
                   -e $USER_EXPERIMENT_DIR/export/trt.int8.engine \
                   -b 8 \
                   -m 8 \
                   -t int8 \
                   -i nchw \
                   $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/retinanet_resnet18_epoch_${EPOCH}_int8.etlt


2022-06-09 23:08:28,039 [INFO] root: Registry: ['nvcr.io']
2022-06-09 23:08:28,104 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
[INFO] [MemUsageChange] Init CUDA: CPU +441, GPU +0, now: CPU 452, GPU 449 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 452 MiB, GPU 449 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 669 MiB, GPU 521 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[WARNING] Missing scale and zero-point for tensor conv1/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/moving_variance, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/Reshape_1/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/batchnorm/add/y, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/gamma, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/Reshape_3/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/beta, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
.
.
.
.
[WARNING] Missing scale and zero-point for tensor retinanet_loc_regressor/bias_3, expect fall back to non-int8 implementation for any layer consuming or producing given tensor [WARNING] Missing scale and zero-point for tensor retinanet_loc_subn_0/bias_3, expect fall back to non-int8 implementation for any layer consuming or producing given tensor

[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +809, GPU +350, now: CPU 1697, GPU 871 (MiB) [INFO] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1823, GPU 929 (MiB) [INFO] Local timing cache in use. Profiling results in this builder pass will not be stored. [INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [INFO] Detected 1 inputs and 2 output network tensors. [INFO] Total Host Persistent Memory: 95216 [INFO] Total Device Persistent Memory: 23142400 [INFO] Total Scratch Memory: 26923776 [INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 89 MiB, GPU 864 MiB [INFO] [BlockAssignment] Algorithm ShiftNTopDown took 14.4801ms to assign 14 blocks to 84 nodes requiring 110344193 bytes. [INFO] Total Activation Memory: 110344193 [INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2720, GPU 1365 (MiB) [INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2720, GPU 1375 (MiB) [INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +21, GPU +26, now: CPU 21, GPU 26 (MiB) 2022-06-09 23:10:52,295 [INFO] tlt.components.docker_handler.docker_handler: Stopping container. ```

brandt33 · June 10, 2022, 4:31pm

Deleted calibration cache /export/cal.bin (also /export/cal.tensorfile) and ran again.

problem of expected fall back to non INT 8 occurs again.

2022-06-10 11:27:45,129 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
[INFO] [MemUsageChange] Init CUDA: CPU +441, GPU +0, now: CPU 452, GPU 449 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 452 MiB, GPU 449 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 669 MiB, GPU 521 MiB
[INFO] Reading Calibration Cache for calibrator: EntropyCalibration2
[INFO] Generated calibration scales using calibration cache. Make sure that calibration cache has latest scales.
[INFO] To regenerate calibration cache, please delete the existing one. TensorRT will generate a new calibration cache.
[WARNING] Missing scale and zero-point for tensor conv1/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/moving_variance, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for tensor bn_conv1/Reshape_1/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
[WARNING] Missing scale and zero-point for ten ```

Morganh · June 10, 2022, 4:49pm

The int8 tensorrt engine should be already generated. That’s expected. You can ignore the warning info.

brandt33 · June 10, 2022, 4:53pm

Ok great, Thank you for the quick clarification! Glad that the engine should be valid. Will compare its throughput performance to the fp16 and fp32 now then.

system · June 24, 2022, 4:53pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bpnet sample code error TAO Toolkit	13	775	October 11, 2022
Deepstream infrence gives no detection TAO Toolkit	28	1934	December 9, 2021
Failed to create .engine File TAO Toolkit	33	2036	July 11, 2022
Tao-converter doesn't work for Deepstream 6.1 TAO Toolkit	7	755	July 14, 2022
Unable to export QAT yolov3 in int8 TAO Toolkit	7	551	April 25, 2023
Integrating my Custom TAO Action Recognition Net to Deepstream DeepStream SDK	34	875	June 27, 2023
How to generate TRT engine from TAO on Triton-Server (TensorRT incompatible) TAO Toolkit	3	997	July 6, 2023
[TAO 5] [Object Detection] Can't export a DINO model after training successfully. Missing Layers? TAO Toolkit	19	828	September 29, 2023
TAO 5.0 Classification (PyTorch) deploy error TAO Toolkit	49	1442	September 11, 2023
[Hugging Face transformer models + pytorch_quantization] PTQ quantization int8 is slower than fp16 TensorRT tensorrt , python , onnx , natural-language-processing-nlp	4	3021	January 6, 2022

TAO toolkit fails to convert RetinaNet INT8 etlt model to INT8 CUDA engine (calibration cache needs to be deleted?)

Export in INT8 mode(generate calibration cache file).

This will generate an .etlt file and an INT8 calibration file.

Related topics