I am running a training of the classifier using ResNet18 and I get the following error at the first epoch:
Epoch 1/80
Traceback (most recent call last):
File "/usr/local/bin/tlt-train-g1", line 8, in <module>
sys.exit(main())
File "./common/magnet_train.py", line 27, in main
File "./makenet/scripts/train.py", line 410, in main
File "./makenet/scripts/train.py", line 385, in run_experiment
File "/usr/local/lib/python2.7/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training_generator.py", line 217, in fit_generator
class_weight=class_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1211, in train_on_batch
class_weight=class_weight)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 809, in _standardize_user_data
y, self._feed_loss_fns, feed_output_shapes)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training_utils.py", line 273, in check_loss_and_target_compatibility
' while using as loss `categorical_crossentropy`. '
ValueError: You are passing a target array of shape (205, 1) while using as loss `categorical_crossentropy`. `categorical_crossentropy` expects targets to be binary matrices (1s and 0s) of shape (samples, classes). If your targets are integer classes, you can convert them to the expected format via:
from keras.utils import to_categorical
y_binary = to_categorical(y_int)
Alternatively, you can use the loss function `sparse_categorical_crossentropy` instead, which does expect integer targets.
My images are of a single class and are sized to 1024x768 resolution. Below is my specification file:
model_config {
# Model architecture can be chosen from:
# ['resnet', 'vgg', 'googlenet', 'alexnet', 'mobilenet_v1', 'mobilenet_v2', 'squeezenet']
arch: "resnet"
# for resnet --> n_layers can be [10, 18, 50]
# for vgg --> n_layers can be [16, 19]
n_layers: 18
use_bias: True
use_batch_norm: True
all_projections: True
use_pooling: False
freeze_bn: False
freeze_blocks: 0
freeze_blocks: 1
# image size should be "3, X, Y", where X,Y >= 16
input_image_size: "3,1024,768"
}
eval_config {
eval_dataset_path: "/workspace/experiments/dataset/test"
model_path: "/workspace/experiments/output/weights/classifier_delivery_epoch_200.tlt"
top_k: 3
#conf_threshold: 0.5
batch_size: 256
n_workers: 8
}
train_config {
train_dataset_path: "/workspace/experiments/dataset/train"
val_dataset_path: "/workspace/experiments/dataset/valid"
# optimizer can be chosen from ['adam', 'sgd']
optimizer: "sgd"
batch_size_per_gpu: 256
n_epochs: 80
n_workers: 16
# regularizer
reg_config {
type: "L2"
scope: "Conv2D,Dense"
weight_decay: 0.00005
}
# learning_rate
lr_config {
# "step" and "soft_anneal" are supported.
scheduler: "soft_anneal"
# "soft_anneal" stands for soft annealing learning rate scheduler.
# the following 4 parameters should be specified if "soft_anneal" is used.
learning_rate: 0.005
soft_start: 0.056
annealing_points: "0.3, 0.6, 0.8"
annealing_divider: 10
# "step" stands for step learning rate scheduler.
# the following 3 parameters should be specified if "step" is used.
# learning_rate: 0.006
# step_size: 10
# gamma: 0.1
}
}
There doesn’t appear to be any way to affect this in the specification file and without access to the code I’m stuck without NVIDIA’s help. Please advise, thanks in advance.