Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
Ubuntu 20 x64, RTX 3060 12g.
• Network Type (Classification)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I’ve retrained a classification model from custom data with varies of resolutions (have not resized before training, is it necessary?), the Evaluate and visualize results are all good, most of the test data can be correctly classified:
Confusion Matrix
[[408 9]
[ 1 408]]
Classification Report
precision recall f1-score support
bicycle 1.00 0.98 0.99 417
electric_bicycle 0.98 1.00 0.99 409
accuracy 0.99 826
macro avg 0.99 0.99 0.99 826
weighted avg 0.99 0.99 0.99 826
classification_spec.cfg:
model_config {
arch: "resnet",
n_layers: 18
# Setting these parameters to true to match the template downloaded from NGC.
use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: "3,224,224"
}
train_config {
train_dataset_path: "/workspace/tao-experiments/data/split/train"
val_dataset_path: "/workspace/tao-experiments/data/split/val"
pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5"
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 64
n_epochs: 80
n_workers: 16
preprocess_mode: "caffe"
enable_random_crop: True
enable_center_crop: True
label_smoothing: 0.0
mixup_alpha: 0.1
# regularizer
reg_config {
type: "L2"
scope: "Conv2D,Dense"
weight_decay: 0.00005
}
# learning_rate
lr_config {
step {
learning_rate: 0.006
step_size: 10
gamma: 0.1
}
}
}
eval_config {
eval_dataset_path: "/workspace/tao-experiments/data/split/test"
model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt"
top_k: 3
batch_size: 256
n_workers: 8
enable_center_crop: True
}
I then export it to .eltl
model file, and volume mapped it into tritonserver:22.02-py3
docker along with the tao-converter, then, in docker, execute with command to convert .etlt
to .plan
:
root@7c298352fe92:/models/tao-converter-x86-tensorrt8.0# ./tao-converter /models/tao-converter-x86-tensorrt8.0/models/classification/export/final_model.etlt -k tlt_encode -d 3,224,224 -o predictions/Softmax -t fp16 -e /models/ele_two_vehicle_net_tao/1/model.plan
the execute result is:
[INFO] [MemUsageChange] Init CUDA: CPU +458, GPU +0, now: CPU 469, GPU 3692 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 469 MiB, GPU 3692 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 623 MiB, GPU 3736 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +809, GPU +350, now: CPU 1565, GPU 4086 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +126, GPU +58, now: CPU 1691, GPU 4144 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 1 output network tensors.
[INFO] Total Host Persistent Memory: 58448
[INFO] Total Device Persistent Memory: 23211520
[INFO] Total Scratch Memory: 1024
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 70 MiB, GPU 640 MiB
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 0.408682ms to assign 4 blocks to 33 nodes requiring 44957697 bytes.
[INFO] Total Activation Memory: 44957697
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2870, GPU 4710 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2870, GPU 4718 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +22, GPU +23, now: CPU 22, GPU 23 (MiB)
and I refer tao-toolkit-triton-apps to generate config.pbtxt
and labels.txt
with content:
name: "ele_two_vehicle_net_tao"
platform: "tensorrt_plan"
max_batch_size : 1
input [
{
name: "input_1"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 224, 224 ]
}
]
output [
{
name: "predictions/Softmax"
data_type: TYPE_FP32
dims: [2, 1, 1]
label_filename: "labels.txt"
}
]
dynamic_batching { }
and
bicycle
electric-bicycle
by using triton client python sample with the testing images that actually copied from training dataset:
python3 image_client.py -m ele_two_vehicle_net_tao ~/Pictures/data/train/electric_bicycle/
I can see the infer result is pretty bad, over half (total image 200) of it were wrongly recognized as bicycle
, could you help to check:
-
why running the
tao-converter
saysSome tactics do not have sufficient workspace memory...
as I can see still a lot GPU memory there. -
why the classification accuracy is low?
thanks.