Tlt export of faster-rcnn error

Hello,

I am using the Transfer Learning Toolkit container from NGC. I’ve trained a faster-rcnn model with a resnet10 backbone. I am now trying to export it in FP32 and FP16 mode and both times I get this error.

This is the command im using to export:
!tlt-export faster_rcnn -m $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18.epoch12.tlt
-o $USER_EXPERIMENT_DIR/data/faster_rcnn/frcnn_kitti_resnet18_retrain_fp16.etlt
-e $SPECS_DIR/fastrcnn_retrain.txt
-k $KEY
–data_type fp16

And I get the following error:

2020-11-02 15:33:26.795166: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4472 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
DEBUG: convert reshape to flatten node
Warning: No conversion function registered for layer: CropAndResize yet.
Converting roi_pooling_conv_1/CropAndResize_new as custom op: CropAndResize
Warning: No conversion function registered for layer: Proposal yet.
Converting proposal as custom op: Proposal

DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking ['proposal', 'dense_class_td/Softmax', 'dense_regress_td/BiasAdd'] as outputs
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../builder/cudnnBuilderUtils.cpp (360) - Cuda Error in findFastestTactic: 2 (out of memory)

My spec file is as follows:

Copyright © 2017-2020, NVIDIA CORPORATION. All rights reserved.

random_seed: 42
enc_key: 'Z2o0aGRiaHNvcXFzNzViYWM0a3FuYW9vZzk6YWIxMDFhYmQtMDNhOS00OTYxLTg5YzMtODM4NzRmNmFlZTI0'
verbose: True
network_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_height_width {
height: 1080
width: 1920
}
    image_channel_mean {
        key: 'b'
        value: 103.939
}
    image_channel_mean {
        key: 'g'
        value: 116.779
}
    image_channel_mean {
        key: 'r'
        value: 123.68
}
image_scaling_factor: 1.0
max_objects_num_per_image: 100
}
feature_extractor: "resnet:10"
anchor_box_config {
scale: 64.0
scale: 128.0
scale: 256.0
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: False
roi_mini_batch: 256
rpn_stride: 16
conv_bn_share_bias: True
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling:False
enable_qat: False
}
training_config {
kitti_data_config {
  data_sources: {
    tfrecords_path: "/workspace/tlt-experiments/tfrecords/kitti_trainval/kitti_trainval*"
    image_directory_path: "/workspace/tlt-experiments/data"
  }
image_extension: 'png'
target_class_mapping {
key: 'stone'
value: 'stone'
}
target_class_mapping {
key: 'grass'
value: 'grass'
}
target_class_mapping {
key: 'humus'
value: 'humus'
}
target_class_mapping {
key: 'mineral'
value: 'mineral'
}
target_class_mapping {
key: 'stub'
value: 'stub'
}
target_class_mapping {
key: 'good_area'
value: 'good_area'
}
validation_fold: 0
}
data_augmentation {
preprocessing {
output_image_width: 1920
output_image_height: 1080
output_image_channel: 3
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 1.0
zoom_max: 1.0
translate_max_x: 0
translate_max_y: 0
}
color_augmentation {
hue_rotation_max: 0.0
saturation_shift_max: 0.0
contrast_scale_max: 0.0
contrast_center: 0.5
}
}
enable_augmentation: True
batch_size_per_gpu: 1
num_epochs: 12
pretrained_weights: "/workspace/tlt-experiments/data/faster_rcnn/resnet_10.hdf5"
#resume_from_model: "/workspace/tlt-experiments/data/faster_rcnn/resnet10.epoch2.tlt"
output_model: "/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet10.tlt"
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10.0
}
classifier_regr_std {
key: 'y'
value: 10.0
}
classifier_regr_std {
key: 'w'
value: 5.0
}
classifier_regr_std {
key: 'h'
value: 5.0
}

rpn_mini_batch: 256
rpn_pre_nms_top_N: 12000
rpn_nms_max_boxes: 2000
rpn_nms_overlap_threshold: 0.7

reg_config {
type: L1
weight: 3e-5
}

optimizer {
sgd {
lr: 0.02
momentum: 0.9
decay: 0.0
nesterov: False
}
}

lr_scheduler {
soft_start {
base_lr: 0.02
start_lr: 0.002
soft_start: 0.1
annealing_points: 0.8
annealing_points: 0.9
annealing_divider: 10.0
}
}

lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0

inference_config {
images_dir: '/workspace/tlt-experiments/data/Test/images'
model: '/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet10.epoch12.tlt'
batch_size: 1
detection_image_output_dir: '/workspace/tlt-experiments/data/faster_rcnn/inference_results_imgs'
labels_dump_dir: '/workspace/tlt-experiments/data/faster_rcnn/inference_dump_labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
bbox_visualize_threshold: 0.6
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
}

evaluation_config {
model: '/workspace/tlt-experiments/data/faster_rcnn/frcnn_kitti_resnet10.epoch12.tlt'
batch_size: 1
validation_period_during_training: 1
labels_dump_dir: '/workspace/tlt-experiments/data/faster_rcnn/test_dump_labels'
rpn_pre_nms_top_N: 6000
rpn_nms_max_boxes: 300
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 300
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric:False
}

}

I need help to export to .etlt.
Thanks

OOM occurs when you run tlt-export. Which gpu did you use?

I am using a GeForce GTX 1660 major

To add to my original post.
I am training it on my own dataset in kitti format. The dataset has images with dimension 1920x1080.

To narrow down, could you please try to train your 1920x1080 dataset with yolo_v3 network and try to run tlt-export again?

Hi,

I have tried with yolo_v3 network, but when I’m trying tlt-export:

# tlt-export will fail if .etlt already exists. So we clear the export folder before tlt-export
!rm -rf $USER_EXPERIMENT_DIR/export
!mkdir -p $USER_EXPERIMENT_DIR/export
# Export in FP32 mode. Change --data_type to fp16 for FP16 mode
!tlt-export yolo -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolo_resnet18_epoch_$EPOCH.tlt \
                 -k $KEY \
                 -o $USER_EXPERIMENT_DIR/export/yolo_resnet18_epoch_$EPOCH.etlt \
                 -e $SPECS_DIR/yolo_retrain_resnet18_kitti.txt \
                 --batch_size 16 \
                 --data_type fp32

I get following warnings

020-11-06 14:34:48.572132: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-06 14:34:48.572428: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-06 14:34:48.572638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4455 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:01:00.0, compute capability: 7.5)
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
Warning: No conversion function registered for layer: BatchedNMS_TRT yet.
Converting BatchedNMS as custom op: BatchedNMS_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting upsample1/ResizeNearestNeighbor as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: ResizeNearest_TRT yet.
Converting upsample0/ResizeNearestNeighbor as custom op: ResizeNearest_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_2 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_1 as custom op: BatchTilePlugin_TRT
Warning: No conversion function registered for layer: BatchTilePlugin_TRT yet.
Converting FirstDimTile_0 as custom op: BatchTilePlugin_TRT
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘BatchedNMS’] as outputs

The spec file for yollo_train is as follows:

random_seed: 42
yolo_config {
big_anchor_shape: “[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]”
mid_anchor_shape: “[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]”
small_anchor_shape: “[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]”
matching_neutral_box_iou: 0.5

arch: “resnet”
nlayers: 18
arch_conv_blocks: 2

loss_loc_weight: 0.75
loss_neg_obj_weights: 200.0
loss_class_weights: 1.0

freeze_blocks: 0
freeze_bn: false
}
training_config {
batch_size_per_gpu: 4
num_epochs: 80
enable_qat: false
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-4
soft_start: 0.1
annealing: 0.8
}
}
regularizer {
type: L1
weight: 5e-5
}
}
eval_config {
validation_period_during_training: 10
average_precision_mode: SAMPLE
batch_size: 16
matching_iou_threshold: 0.5
}
nms_config {
confidence_threshold: 0.01
clustering_iou_threshold: 0.6
top_k: 200
}
augmentation_config {
preprocessing {
output_image_width: 1248
output_image_height: 384
output_image_channel: 3
crop_right: 1248
crop_bottom: 384
min_bbox_width: 1.0
min_bbox_height: 1.0
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 0.7
zoom_max: 1.8
translate_max_x: 8.0
translate_max_y: 8.0
}
color_augmentation {
hue_rotation_max: 25.0
saturation_shift_max: 0.20000000298
contrast_scale_max: 0.10000000149
contrast_center: 0.5
}
}
dataset_config {
data_sources: {
tfrecords_path: “/workspace/tlt-experiments/Dataset/tfrecords/kitti_trainval/kitti_trainval*”
image_directory_path: “/workspace/tlt-experiments/Dataset”
}
image_extension: “png”
target_class_mapping {
key: “stub”
value: “stub”
}
target_class_mapping {
key: “stone”
value: “stone”
}

target_class_mapping {
  key: "grass"
  value: "grass"

}
target_class_mapping {
key: “humus”
value: “humus”
}
target_class_mapping {
key: “mineral”
value: “mineral”
}
target_class_mapping {
key: “good_area”
value: “good_area”
}
validation_fold: 0
}

and the retrain_spec is

random_seed: 42
yolo_config {
  big_anchor_shape: "[(114.94, 60.67), (159.06, 114.59), (297.59, 176.38)]"
  mid_anchor_shape: "[(42.99, 31.91), (79.57, 31.75), (56.80, 56.93)]"
  small_anchor_shape: "[(15.60, 13.88), (30.25, 20.25), (20.67, 49.63)]"
  matching_neutral_box_iou: 0.5

  arch: "resnet"
  nlayers: 18
  arch_conv_blocks: 2

  loss_loc_weight: 0.75
  loss_neg_obj_weights: 200.0
  loss_class_weights: 1.0

  freeze_bn: false
}
training_config {
  batch_size_per_gpu: 4
  num_epochs: 80
  enable_qat: false
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 1e-6
    max_learning_rate: 1e-4
    soft_start: 0.1
    annealing: 0.5
    }
  }
  regularizer {
    type: NO_REG
    weight: 3e-9
  }
}
eval_config {
  validation_period_during_training: 10
  average_precision_mode: SAMPLE
  batch_size: 16
  matching_iou_threshold: 0.5
}
nms_config {
  confidence_threshold: 0.01
  clustering_iou_threshold: 0.6
  top_k: 200
}
augmentation_config {
  preprocessing {
    output_image_width: 1248
    output_image_height: 384
    output_image_channel: 3
    crop_right: 1248
    crop_bottom: 384
    min_bbox_width: 1.0
    min_bbox_height: 1.0
  }
  spatial_augmentation {
    hflip_probability: 0.5
    vflip_probability: 0.0
    zoom_min: 0.7
    zoom_max: 1.8
    translate_max_x: 8.0
    translate_max_y: 8.0
  }
  color_augmentation {
    hue_rotation_max: 25.0
    saturation_shift_max: 0.20000000298
    contrast_scale_max: 0.10000000149
    contrast_center: 0.5
  }
}
dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tlt-experiments/Dataset/tfrecords/kitti_trainval/kitti_trainval*"
    image_directory_path: "/workspace/tlt-experiments/Dataset"
  }
  image_extension: "png"
  target_class_mapping {
      key: "stub"
      value: "stub"
  }
  target_class_mapping {
      key: "stone"
      value: "stone"
  }
  
    target_class_mapping {
      key: "grass"
      value: "grass"
  }
  target_class_mapping {
      key: "humus"
      value: "humus"
  }
  target_class_mapping {
      key: "mineral"
      value: "mineral"
  }
  target_class_mapping {
      key: "good_area"
      value: "good_area"
  }
validation_fold: 0
}

Please check if
$USER_EXPERIMENT_DIR/export/yolo_resnet18_epoch_$EPOCH.etlt

is already generated.

Hi!

No it was not generated previously

Please double check.
Or you can run the default yolo jupyter notebook to see if there is any problem.