Hello,
1- I’m training a classification model using TAO, I managed to get a good validation accuracy while training. But when I do the inference, the results are not good.
2- I used the same dataset to train a resnet18 on pytorch, and I got good results (onnx model). After that I used the same onnx model on deepstream, and the results are different.
Please provide the following information when requesting support.
• Hardware : DGX
• Network Type : Resnet18/Classification
• Training spec file : model_config {
Model Architecture can be chosen from:
[‘resnet’, ‘vgg’, ‘googlenet’, ‘alexnet’]
arch: “resnet”
for resnet → n_layers can be [10, 18, 50]
for vgg → n_layers can be [16, 19]
n_layers: 18
freeze_blocks: 0
freeze_blocks: 1
use_batch_norm: True
use_bias: False
all_projections: False
use_pooling: True
use_imagenet_head: False
resize_interpolation_method: BILINEAR
if you want to use the pretrained model,
image size should be “3,224,224”
otherwise, it can be “3, X, Y”, where X,Y >= 16
input_image_size: “3,224,224”
}
train_config {
train_dataset_path: “/raid/dataset/camera_state_classification/05-01-2023”
val_dataset_path: “/raid/dataset/camera_state_classification/test05-01”
#pretrained_model_path: “/workspace/tao_experiments/train/classification/camera_state/224-224/v0-pruned/resnet_015.tlt”
pretrained_model_path: “/workspace/tao_experiments/nvidia_pretrained/imagenet/resnet_18.hdf5”
Only [‘sgd’, ‘adam’] are supported for optimizer
optimizer {
sgd {
lr: 0.0001
decay: 0.00001
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 32
n_epochs: 30
Number of CPU cores for loading data
n_workers: 16
regularizer
reg_config {
# regularizer type can be “L1”, “L2” or “None”.
type: “L2”
# if the type is not “None”,
# scope can be either “Conv2D” or “Dense” or both.
scope: “Conv2D,Dense”
# 0 < weight decay < 1
weight_decay: 0.000015
}
learning_rate
lr_config {
cosine {
learning_rate: 0.001
min_lr_ratio: 0.1
soft_start: 0.0
}
}
enable_random_crop: False
enable_center_crop: False
enable_color_augmentation: False
mixup_alpha: 0.0
label_smoothing: 0.1
preprocess_mode: “caffe”
image_mean {
key: ‘b’
value: 103.939
}
image_mean {
key: ‘g’
value: 116.779
}
image_mean {
key: ‘r’
value: 123.68
}
}
eval_config {
eval_dataset_path: “/raid/dataset/camera_state_classification/test05-01”
model_path: “/workspace/tao_experiments/train/classification/camera_state/05-01-2023/weights/resnet_025.tlt”
top_k: 1
batch_size: 256
n_workers: 8
enable_center_crop: True
}