Error: Transfer learning toolkit for classification failed to setting image size

suixiaodan521 · June 10, 2020, 1:59pm

I have modified the classification_spec.cfg for changing the image size.

model_config {
#Model architecture can be chosen from:
#[‘resnet’, ‘vgg’, ‘googlenet’, ‘alexnet’, ‘mobilenet_v1’, ‘mobilenet_v2’, ‘squeezenet’, > ‘darknet’, ‘googlenet’]
arch: “resnet”
#for resnet → n_layers can be [10, 18, 34, 50, 101]
#for vgg → n_layers can be [16, 19]
#for darknet → n_layers can be [19, 53]

n_layers: 10
use_bias: True
use_batch_norm: True
all_projections: True
use_pooling: False
freeze_bn: False
freeze_blocks: 0
freeze_blocks: 1

#image size should be “3, X, Y”, where X,Y >= 16
input_image_size: “3, 512, 512”
}

eval_config {
eval_dataset_path: “/TransforLearningToolkit/mashuai/lungtumors/split/test”
model_path: “/TransforLearningToolkit/mashuai/classification_experiment/output50/weights/resnet_080.tlt”
top_k: 3
#conf_threshold: 0.5
batch_size: 64
n_workers: 8

}

train_config {
train_dataset_path: “/TransforLearningToolkit/mashuai/lungtumors/split/train”
val_dataset_path: “/TransforLearningToolkit/mashuai/lungtumors/split/val”
pretrained_model_path: “/TransforLearningToolkit/mashuai/classification_experiment/pretrained_resnet10/tlt_pretrained_classification_vresnet10/resnet_10.hdf5”
#optimizer can be chosen from [‘adam’, ‘sgd’]

optimizer: “sgd”
batch_size_per_gpu: 32
n_epochs: 80
n_workers: 8

#regularizer
reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005

}

#learning_rate

lr_config {
#"step" and "soft_anneal" are supported.

scheduler: "soft_anneal"

# "soft_anneal" stands for soft annealing learning rate scheduler.
# the following 4 parameters should be specified if "soft_anneal" is used.
learning_rate: 0.005
soft_start: 0.056
annealing_points: "0.3, 0.6, 0.8"
annealing_divider: 10
# "step" stands for step learning rate scheduler.
# the following 3 parameters should be specified if "step" is used.
# learning_rate: 0.006
# step_size: 10
# gamma: 0.1

# "cosine" stands for soft start cosine learning rate scheduler.
# the following 2 parameters should be specified if "cosine" is used.
# learning_rate: 0.05
# soft_start: 0.01
}
}

When I run

!tlt-train classification -e $SPECS_DIR/classification_spec.cfg -r $USER_EXPERIMENT_DIR/output -k $KEY

I get wrong as below:

Using TensorFlow backend.
2020-06-10 13:47:12,971 [INFO] iva.makenet.scripts.train: Loading experiment spec at /TransforLearningToolkit/mashuai/specs/classification_spec.cfg.
Found 4211 images belonging to 7 classes.
2020-06-10 13:47:13,341 [INFO] iva.makenet.scripts.train: Processing dataset (train): /TransforLearningToolkit/mashuai/lungtumors/split/train
Found 603 images belonging to 7 classes.
2020-06-10 13:47:13,462 [INFO] iva.makenet.scripts.train: Processing dataset (validation): /TransforLearningToolkit/mashuai/lungtumors/split/val

Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 3, 224, 224) 0

conv1 (Conv2D) (None, 64, 112, 112) 9408 input_1[0][0]

bn_conv1 (BatchNormalization) (None, 64, 112, 112) 256 conv1[0][0]

activation_1 (Activation) (None, 64, 112, 112) 0 bn_conv1[0][0]

block_1a_conv_1 (Conv2D) (None, 64, 56, 56) 36864 activation_1[0][0]

block_1a_bn_1 (BatchNormalizati (None, 64, 56, 56) 256 block_1a_conv_1[0][0]

block_1a_relu_1 (Activation) (None, 64, 56, 56) 0 block_1a_bn_1[0][0]

block_1a_conv_2 (Conv2D) (None, 64, 56, 56) 36864 block_1a_relu_1[0][0]

block_1a_conv_shortcut (Conv2D) (None, 64, 56, 56) 4096 activation_1[0][0]

block_1a_bn_2 (BatchNormalizati (None, 64, 56, 56) 256 block_1a_conv_2[0][0]

block_1a_bn_shortcut (BatchNorm (None, 64, 56, 56) 256 block_1a_conv_shortcut[0][0]

add_1 (Add) (None, 64, 56, 56) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]

block_1a_relu (Activation) (None, 64, 56, 56) 0 add_1[0][0]

block_2a_conv_1 (Conv2D) (None, 128, 28, 28) 73728 block_1a_relu[0][0]

block_2a_bn_1 (BatchNormalizati (None, 128, 28, 28) 512 block_2a_conv_1[0][0]

block_2a_relu_1 (Activation) (None, 128, 28, 28) 0 block_2a_bn_1[0][0]

block_2a_conv_2 (Conv2D) (None, 128, 28, 28) 147456 block_2a_relu_1[0][0]

block_2a_conv_shortcut (Conv2D) (None, 128, 28, 28) 8192 block_1a_relu[0][0]

block_2a_bn_2 (BatchNormalizati (None, 128, 28, 28) 512 block_2a_conv_2[0][0]

block_2a_bn_shortcut (BatchNorm (None, 128, 28, 28) 512 block_2a_conv_shortcut[0][0]

add_2 (Add) (None, 128, 28, 28) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]

block_2a_relu (Activation) (None, 128, 28, 28) 0 add_2[0][0]

block_3a_conv_1 (Conv2D) (None, 256, 14, 14) 294912 block_2a_relu[0][0]

block_3a_bn_1 (BatchNormalizati (None, 256, 14, 14) 1024 block_3a_conv_1[0][0]

block_3a_relu_1 (Activation) (None, 256, 14, 14) 0 block_3a_bn_1[0][0]

block_3a_conv_2 (Conv2D) (None, 256, 14, 14) 589824 block_3a_relu_1[0][0]

block_3a_conv_shortcut (Conv2D) (None, 256, 14, 14) 32768 block_2a_relu[0][0]

block_3a_bn_2 (BatchNormalizati (None, 256, 14, 14) 1024 block_3a_conv_2[0][0]

block_3a_bn_shortcut (BatchNorm (None, 256, 14, 14) 1024 block_3a_conv_shortcut[0][0]

add_3 (Add) (None, 256, 14, 14) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]

block_3a_relu (Activation) (None, 256, 14, 14) 0 add_3[0][0]

block_4a_conv_1 (Conv2D) (None, 512, 14, 14) 1179648 block_3a_relu[0][0]

block_4a_bn_1 (BatchNormalizati (None, 512, 14, 14) 2048 block_4a_conv_1[0][0]

block_4a_relu_1 (Activation) (None, 512, 14, 14) 0 block_4a_bn_1[0][0]

block_4a_conv_2 (Conv2D) (None, 512, 14, 14) 2359296 block_4a_relu_1[0][0]

block_4a_conv_shortcut (Conv2D) (None, 512, 14, 14) 131072 block_3a_relu[0][0]

block_4a_bn_2 (BatchNormalizati (None, 512, 14, 14) 2048 block_4a_conv_2[0][0]

block_4a_bn_shortcut (BatchNorm (None, 512, 14, 14) 2048 block_4a_conv_shortcut[0][0]

add_4 (Add) (None, 512, 14, 14) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]

block_4a_relu (Activation) (None, 512, 14, 14) 0 add_4[0][0]

avg_pool (AveragePooling2D) (None, 512, 1, 1) 0 block_4a_relu[0][0]

flatten (Flatten) (None, 512) 0 avg_pool[0][0]

predictions (Dense) (None, 176) 90288 flatten[0][0]

Total params: 5,006,192
Trainable params: 5,000,304
Non-trainable params: 5,888

Epoch 1/80
Traceback (most recent call last):
File “/usr/local/bin/tlt-train-g1”, line 8, in
sys.exit(main())
File “./common/magnet_train.py”, line 30, in main
File “./makenet/scripts/train.py”, line 437, in main
File “./makenet/scripts/train.py”, line 411, in run_experiment
File “/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py”, line 91, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training.py”, line 1418, in fit_generator
initial_epoch=initial_epoch)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training_generator.py”, line 217, in fit_generator
class_weight=class_weight)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training.py”, line 1211, in train_on_batch
class_weight=class_weight)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training.py”, line 751, in _standardize_user_data
exception_prefix=‘input’)
File “/usr/local/lib/python2.7/dist-packages/keras/engine/training_utils.py”, line 138, in standardize_input_data
str(data_shape))
ValueError: Error when checking input: expected input_1 to have shape (3, 224, 224) but got array with shape (3, 512, 512)

I failed to change the image size.

Morganh · June 10, 2020, 2:49pm

Which TLT docker did you use?

suixiaodan521 · June 11, 2020, 12:49am

Repository:nvcr.io/nvidia/tlt-streamanalytics
Tag: v2.0_dp_py2

docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2

Morganh · June 11, 2020, 2:01am

Hi,
I did not reproduce with default classification Jupyter notebook inside 2.0_dp docker.
After I changes

input_image_size: “3,224,224”

to

input_image_size: “3,512,512”

It can run training. Could you double check or try the classficaton notebook too?

suixiaodan521 · June 11, 2020, 2:31am

When I comment “use_bias: True” in “model_config”. It works well.
Thank you.

pushkar.chatterji · July 9, 2020, 1:37pm

Hi,

I got the same error when I set the line in the config file as:

input_image_size: “3,91,256”

I already have ‘use_bias: True’

It only seems to work when is use:

input_image_size: “3,224,224”

The docker image is:

docker pull nvcr.io/nvidia/tlt-streamanalytics:v2.0_dp_py2

Morganh · July 10, 2020, 3:11am

Please remove “use_bias: True”and retry.

pushkar.chatterji · July 10, 2020, 8:39am

Thanks, that worked. But I am not sure why. A bug or a feature? :)

Morganh · July 10, 2020, 9:05am

I will check it.

Topic		Replies	Views
Inference erron in classification TAO Toolkit	4	536	September 29, 2020
Training Classification model from scratch TAO Toolkit	5	872	November 7, 2019
tlt-infer ValueError: could not broadcast input array from shape (3,300,224) into shape (3,224,300) TAO Toolkit	11	1950	September 1, 2021
Tlt-train classification error TAO Toolkit	6	749	July 21, 2020
Yolov4 for different input size, pretrained model weights for the different sizes TAO Toolkit	30	4913	February 24, 2022
IndexError: index 6 is out of bounds for axis 1 with size 6 while training by using FasterRCNN. TAO Toolkit	22	4238	December 4, 2019
Error while training with higher resolution images in yolo_v4 TLT-V3 TAO Toolkit	6	658	April 1, 2021
TLT Classification example loss and val_acc unable to converge during training TAO Toolkit nvbugs	11	859	August 18, 2021
YOLO V4 TLT general questions on dataset TAO Toolkit	1	665	June 17, 2021
An error occurred while training with TLT TAO Toolkit	10	871	March 25, 2021

Error: Transfer learning toolkit for classification failed to setting image size

Layer (type) Output Shape Param # Connected to

predictions (Dense) (None, 176) 90288 flatten[0][0]

Related topics