I am trying to do inference at varying batch sizes and it appears that after a batch size of 32 the inference fails due to internal errors. Below is the error received for having a batch size of 33:
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 587520 values, but the requested shape requires a multiple of 33
[[{{node proposal_1/Reshape_6}}]]
[[proposal_1/cond_411/switch_t/_2978]]
(1) Invalid argument: Input to reshape is a tensor with 587520 values, but the requested shape requires a multiple of 33
[[{{node proposal_1/Reshape_6}}]]
Note that this incompatible size does not happen with a batch size of 31, which executes normally. Here is there error with a larger batch of 64:
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/inference.py", line 301, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/inference.py", line 289, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/faster_rcnn/scripts/inference.py", line 218, in main
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1169, in predict
steps=steps)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 294, in predict_loop
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Incompatible shapes: [1175040] vs. [587520]
[[{{node proposal_1/mul_4}}]]
[[proposal_1/cond_122/Min/Switch/_3675]]
(1) Invalid argument: Incompatible shapes: [1175040] vs. [587520]
[[{{node proposal_1/mul_4}}]]
0 successful operations.
The inference is being performed on an RTX 3090, which is only %60 utilized with a batch size of 32, leading me to believe it is not an issue with memory. Only the batch_size for the infer configurations in the experiment spec is being changed each time.
Full spec below:
random_seed: 42
enc_key: 'zeroeyes'
verbose: True
model_config {
input_image_config {
image_type: RGB
image_channel_order: 'bgr'
size_height_width {
height: 540
width: 960
}
image_channel_mean {
key: 'b'
value: 114.244538027499
}
image_channel_mean {
key: 'g'
value: 117.13666031604566
}
image_channel_mean {
key: 'r'
value: 116.52070103424707
}
image_scaling_factor: 1
max_objects_num_per_image: 10
}
arch: "resnet:34"
anchor_box_config {
scale: 20
scale: 40
scale: 90
ratio: 1.0
ratio: 0.5
ratio: 2.0
}
freeze_bn: True
roi_mini_batch: 256
rpn_stride: 16
use_bias: False
roi_pooling_config {
pool_size: 7
pool_size_2x: False
}
all_projections: True
use_pooling: False
}
dataset_config {
data_sources: {
tfrecords_path: "/workspace/ZNT/Z_35/tfrecords/train/tfrecord*"
image_directory_path: "/workspace/DAB/D_39"
}
image_extension: 'jpg'
target_class_mapping {
key: 'p_1'
value: 'P'
}
target_class_mapping {
key: 'r_1'
value: 'R'
}
target_class_mapping {
key: 'ca_1'
value: 'DontCare'
}
target_class_mapping {
key: 'dc_0'
value: 'DontCare'
}
validation_data_source: {
tfrecords_path: "/workspace/ZNT/Z_35/tfrecords/val/tfrecord*"
image_directory_path: "/workspace/DAB/D_39"
}
}
augmentation_config {
preprocessing {
output_image_width: 960
output_image_height: 540
output_image_channel: 3
min_bbox_width: 0.0
min_bbox_height: 0.0
enable_auto_resize: False
}
spatial_augmentation {
hflip_probability: 0.5
vflip_probability: 0.0
zoom_min: 0.75
zoom_max: 1.25
translate_max_x: 192
translate_max_y: 104
rotate_rad_max: 0.7
}
color_augmentation {
hue_rotation_max: 50
saturation_shift_max: 0.3
contrast_scale_max: 0.25
contrast_center: 0.5
}
}
training_config {
checkpoint_interval: 1
pretrained_weights: "/workspace/pretrained_models/resnet_34.hdf5"
output_model: "/workspace/ZNT/Z_35/weights/model.tlt"
enable_augmentation: True
enable_qat: True
batch_size_per_gpu: 12
num_epochs: 100
rpn_min_overlap: 0.3
rpn_max_overlap: 0.7
classifier_min_overlap: 0.0
classifier_max_overlap: 0.5
gt_as_roi: False
std_scaling: 1.0
classifier_regr_std {
key: 'x'
value: 10
}
classifier_regr_std {
key: 'y'
value: 10
}
classifier_regr_std {
key: 'w'
value: 5
}
classifier_regr_std {
key: 'h'
value: 5
}
rpn_mini_batch: 256
rpn_pre_nms_top_N: 1000
rpn_nms_max_boxes: 200
rpn_nms_overlap_threshold: 0.6
regularizer {
type: L2
weight: 1e-05
}
optimizer {
adam {
lr: 0.00001
beta_1: 0.9
beta_2: 0.999
decay: 0.0
}
}
learning_rate {
step {
base_lr: 2e-05
gamma: 0.75
step_size: 10
}
}
lambda_rpn_regr: 1.0
lambda_rpn_class: 1.0
lambda_cls_regr: 1.0
lambda_cls_class: 1.0
}
inference_config {
images_dir: '/workspace/DAB/D_infer_test/128'
model: '/workspace/ZNT/Z_35/models/model.epoch31.tlt'
batch_size: 64
detection_image_output_dir: '/workspace/DAB/D_infer_test/128_results/images'
labels_dump_dir: '/workspace/DAB/D_infer_test/128_results/labels'
rpn_pre_nms_top_N: 3000
rpn_nms_max_boxes: 500
rpn_nms_overlap_threshold: 0.7
object_confidence_thres: 0.0001
bbox_visualize_threshold: 0.9
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
}
evaluation_config {
model: '/workspace/ZNT/Z_35/weights/model.epoch20.tlt'
batch_size: 12
validation_period_during_training: 1
rpn_pre_nms_top_N: 3000
rpn_nms_max_boxes: 500
rpn_nms_overlap_threshold: 0.7
classifier_nms_max_boxes: 100
classifier_nms_overlap_threshold: 0.3
object_confidence_thres: 0.0001
use_voc07_11point_metric: False
gt_matching_iou_threshold: 0.5
}