Tao export

rishika.v · June 23, 2022, 9:45am

tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
/usr/local/lib/python3.6/dist-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the ‘NUMBAPRO’ prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: (‘Overview — Numba 0.50.1 documentation’, ‘#cudatoolkit-lookup’)
warnings.warn(errors.NumbaWarning(msg))
/usr/local/lib/python3.6/dist-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the ‘NUMBAPRO’ prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: (‘Overview — Numba 0.50.1 documentation’, ‘#cudatoolkit-lookup’)
warnings.warn(errors.NumbaWarning(msg))
2022-06-23 09:23:16,497 [INFO] iva.common.export.keras_exporter: Using input nodes: [‘input_1’]
2022-06-23 09:23:16,498 [INFO] iva.common.export.keras_exporter: Using output nodes: [‘predictions/Softmax’]
NOTE: UFF has been tested with TensorFlow 1.14.0.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
DEBUG: convert reshape to flatten node
DEBUG [/usr/local/lib/python3.6/dist-packages/uff/converters/tensorflow/converter.py:96] Marking [‘predictions/Softmax’] as outputs
2022-06-23 09:23:20,405 [INFO] iva.common.export.keras_exporter: Calibration takes time especially if number of batches is large.
terminate called after throwing an instance of ‘pybind11::error_already_set’
what(): ValueError: Batch size yielded from data source 8 < requested batch size from calibrator 16

At:
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/tensorfile_calibrator.py(79): get_data_from_source
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/tensorfile_calibrator.py(95): get_batch
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py(537): init
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/_tensorrt.py(696): init
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/keras_exporter.py(445): export
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py(250): run_export
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/makenet/scripts/export.py(42): main
/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/makenet/scripts/export.py(46):

2022-06-23 14:53:24,294 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · June 23, 2022, 9:55am

Please share full command. Thanks.
Also please share the spec file as well.

rishika.v · June 23, 2022, 9:59am

!tao classification export
-m $USER_EXPERIMENT_DIR/output/weights/resnet_450.tlt
-o $USER_EXPERIMENT_DIR/output/weights/final_model.etlt
-k $KEY
–cal_data_file $USER_EXPERIMENT_DIR/output/weights/calibration.tensor
–data_type int8
–batches 16
–cal_cache_file $USER_EXPERIMENT_DIR/outpu/weights/final_model_int8_cache.bin
-v

rishika.v · June 23, 2022, 10:00am

model_config {
arch: “resnet”,
n_layers: 10

Setting these parameters to true to match the template downloaded from NGC.

use_batch_norm: true
all_projections: true
freeze_blocks: 0
freeze_blocks: 1
input_image_size: “3,224,224”
}
train_config {
train_dataset_path: “/workspace/tao-experiments/data/split/train”
val_dataset_path: “/workspace/tao-experiments/data/split/val”
pretrained_model_path: “/workspace/tao-experiments/classification/pretrained_resnet10/pretrained_classification_vresnet10/resnet_10.hdf5”
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 8
n_epochs: 450
n_workers: 4
preprocess_mode: “caffe”
enable_random_crop: True
enable_center_crop: True
label_smoothing: 0.0
mixup_alpha: 0.1

regularizer

reg_config {
type: “L2”
scope: “Conv2D,Dense”
weight_decay: 0.00005
}

learning_rate

lr_config {
step {
learning_rate: 0.001
step_size: 10
gamma: 0.1
}
}
}
eval_config {
eval_dataset_path: “/workspace/tao-experiments/data/split/test”
model_path: “/workspace/tao-experiments/classification/output/weights/resnet_450.tlt”
top_k: 3
batch_size: 1
n_workers: 4
enable_center_crop: True
}

rishika.v · June 23, 2022, 10:34am

Anything yet?

Morganh · June 23, 2022, 4:01pm

How many images in your training dataset and val dataset?

rishika.v · June 24, 2022, 6:37am

TRAIN IMAGE COUNT

DATA1 11297
DATA2 3009
DATA3 4656
DATA4 615
DATA5 98
… 839
… 856
… 7332
… 210
… 634
… 271
… 656
… 672
… 541
… 101
…1456
… 3185
… 145
… 1088
… 1546
… 87

VAL IMAGE COUNT

DATA1 1614
DATA2 430
DATA3 665
DATA4 88
DATA5 14
… 120
… 123
… 1048
… 30
… 90
… 39
… 94
… 96
… 78
… 15
… 208
… 455
… 21
… 156
… 221
… 13

This is count per class of my training and validation datasset.

Morganh · June 24, 2022, 6:52am

Please run below and share the result.
! tao classification run ls /workspace/tao-experiments/data/split/train
! tao classification run ls /workspace/tao-experiments/data/split/val
! tao classification run ls /workspace/tao-experiments/data/split/test

rishika.v · June 24, 2022, 9:15am

DATA1 DATA2 DATA3 DATA4 DATA5 DATA6 DATA7
DATA8 DATA9 DATA10 DATA11 DATA12 DATA13 DATA14
DATA15 DATA16 DATA17 DATA18 DATA19 DATA20 DATA21

These are the same results for train / val / test

Morganh · June 24, 2022, 9:19am

Make sure the tensor file is available.
! tao classification run ls $USER_EXPERIMENT_DIR/output/weights/calibration.tensor

rishika.v · June 24, 2022, 9:21am

I have generated the file

Morganh · June 24, 2022, 9:52am

Can you set "-m 8 " when you run
! tao classification calibration_tensorfile

rishika.v · June 24, 2022, 10:04am

Did that before posting the error.
Tried with 8, 16, 4 values

rishika.v · June 24, 2022, 10:05am

2022-06-23 16:19:58,454 [INFO] root: Registry: [‘nvcr.io’]
2022-06-23 16:19:58,533 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
Using TensorFlow backend.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
2022-06-23 10:50:04,522 [INFO] main: Loading experiment spec at /workspace/tao-experiments/classification/specs/classification_spec.cfg
2022-06-23 10:50:04,523 [INFO] main: Setting up input generator.
Found 39294 images belonging to 21 classes.
Writing calibration tensorfile: 100%|█████████████| 8/8 [00:00<00:00, 9.46it/s]
2022-06-23 10:50:06,137 [INFO] main: Calibration tensorfile written.
2022-06-23 16:20:07,129 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · June 24, 2022, 4:51pm

Please set batch_size_per_gpu: 16

rishika.v · June 27, 2022, 5:38am

Are you telling me to re run the training?

Morganh · June 27, 2022, 6:01am

No, just modify in the spec file and export again.

rishika.v · June 27, 2022, 6:08am

Oh! that fixed the issue.
Thank you!

Morganh · June 27, 2022, 8:24am

Another solution:
When you run "classification export ", please add parameter “--batch_size” accordingly.
--batch_size your_batch_size_setting_in_spec

system · July 11, 2022, 8:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao Training failing on creating directory on a standard example TAO Toolkit tao	10	724	September 6, 2022
TAO classification export issue TAO Toolkit	4	898	March 8, 2022
Error when training YOLOV3 with TAO TAO Toolkit	5	553	May 20, 2022
Tao inference ValueError: could not broadcast input array from shape (4700160) into shape (1566720) TAO Toolkit	9	639	December 9, 2022
Tao Training Model Error TAO Toolkit	7	486	January 15, 2024
Unable to export QAT yolov3 in int8 TAO Toolkit	7	548	April 25, 2023
TAO Toolkit Training Error TAO Toolkit	2	710	August 2, 2022
Cannot use TensorRT model exported by NVIDIA TAO TAO Toolkit	8	1097	May 17, 2022
Tao toolkit facenet Error TAO Toolkit	14	1282	March 7, 2022
Can't see the classification and other folder inside TLT-V3 TAO Toolkit	21	2493	October 12, 2021

Tao export

Setting these parameters to true to match the template downloaded from NGC.

regularizer

learning_rate

Related topics