Change prediction layer of pretrained model TAO

josemiad · July 5, 2024, 3:34pm

¿How can I change my prediction layer to 42 classes instead of 20 in VehicleMakeNet with TAO?

• Hardware: ubuntu 20 x86 with RTX 3060
• Network Type: VehicleMakeNet (Resnet18 backbone)
• TLT Version: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
• Training spec file:
train_car_make.txt (1.5 KB)

• The issue:

!tao model classification_tf1 train \
                  -e $SPECS_DIR/train_car_make.txt \
                  -r $RESULTS_DIR/output \
                  --key $KEY \
                  --verbose \
                  --gpus $NUM_GPUS

I have a train data with 42 classes while my resnet18 pretrained model have 20 output. This is my output log:

2024-07-05 17:14:39,752 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 361: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-07-05 17:14:39,772 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 293: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/josemi/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2024-07-05 17:14:39,772 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Using TensorFlow backend.
2024-07-05 15:14:40.335109: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-05 15:14:40,367 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2024-07-05 15:14:41,063 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-07-05 15:14:41,089 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-07-05 15:14:41,093 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:150: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_hue(img, max_delta=10.0):
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:173: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_saturation(img, max_shift):
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:183: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_contrast(img, center, max_contrast_scale):
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:192: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_shift(x_img, shift_stddev):
2024-07-05 15:14:42.160049: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2592000000 Hz
2024-07-05 15:14:42.160467: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x83a88e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2024-07-05 15:14:42.160488: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2024-07-05 15:14:42.161283: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1
2024-07-05 15:14:42.186770: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-05 15:14:42.186990: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x81c3c90 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-07-05 15:14:42.187009: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6
2024-07-05 15:14:42.187150: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-05 15:14:42.187222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Found device 0 with properties: 
name: NVIDIA GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.807
pciBusID: 0000:01:00.0
2024-07-05 15:14:42.187244: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-05 15:14:42.187289: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcublas.so.12
2024-07-05 15:14:42.188281: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcufft.so.11
2024-07-05 15:14:42.188333: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcurand.so.10
2024-07-05 15:14:42.190099: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusolver.so.11
2024-07-05 15:14:42.190644: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusparse.so.12
2024-07-05 15:14:42.190678: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudnn.so.8
2024-07-05 15:14:42.190736: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-05 15:14:42.190843: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-05 15:14:42.190901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1802] Adding visible gpu devices: 0
2024-07-05 15:14:42.190919: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-07-05 15:14:42.195246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1214] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-07-05 15:14:42.195276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1220]      0 
2024-07-05 15:14:42.195285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] 0:   N 
2024-07-05 15:14:42.195465: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-05 15:14:42.195609: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-07-05 15:14:42.195693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1359] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8965 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6)
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.

WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-07-05 15:14:43,154 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-07-05 15:14:43,179 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-07-05 15:14:43,181 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:150: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_hue(img, max_delta=10.0):
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:173: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_saturation(img, max_shift):
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:183: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_contrast(img, center, max_contrast_scale):
/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:192: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def random_shift(x_img, shift_stddev):
2024-07-05 15:14:43,952 [TAO Toolkit] [INFO] __main__ 388: Loading experiment spec at /specs/train_car_make.cfg.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:398: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2024-07-05 15:14:43,953 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:398: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:407: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2024-07-05 15:14:43,953 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:407: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2024-07-05 15:14:44,021 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.logging.logging 197: Log file already exists at /results/output/status.json
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:431: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

2024-07-05 15:14:44,021 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:431: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:431: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

2024-07-05 15:14:44,021 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py:431: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

Found 44959 images belonging to 42 classes.
2024-07-05 15:14:44,588 [TAO Toolkit] [INFO] __main__ 294: Processing dataset (train): /data/car_make_dataset/training_set
Found 9604 images belonging to 42 classes.
2024-07-05 15:14:44,715 [TAO Toolkit] [INFO] __main__ 311: Processing dataset (validation): /data/car_make_dataset/val_set
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2024-07-05 15:14:44,715 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2024-07-05 15:14:44,715 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2024-07-05 15:14:44,717 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2024-07-05 15:14:44,732 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

2024-07-05 15:14:44,736 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:199: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

2024-07-05 15:14:45,215 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/third_party/keras/tensorflow_backend.py:199: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

2024-07-05 15:14:45,818 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2024-07-05 15:14:45,818 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

2024-07-05 15:14:45,818 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

2024-07-05 15:14:46,140 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

2024-07-05 15:14:48,529 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

2024-07-05 15:14:48,531 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

2024-07-05 15:14:49,376 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2024-07-05 15:14:49,478 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:973: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 3, 224, 224)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 112, 112) 9408        input_1[0][0]                    
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 112, 112) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 112, 112) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 56, 56)   36864       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 56, 56)   256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 56, 56)   0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 56, 56)   36864       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 56, 56)   4096        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 56, 56)   256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 56, 56)   256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 56, 56)   0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 56, 56)   0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 56, 56)   36864       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 56, 56)   256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 56, 56)   0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 56, 56)   36864       block_1b_relu_1[0][0]            
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 56, 56)   4096        block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 56, 56)   256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (None, 64, 56, 56)   256         block_1b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 56, 56)   0           block_1b_bn_2[0][0]              
                                                                 block_1b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 56, 56)   0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 28, 28)  73728       block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 28, 28)  512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 28, 28)  0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 28, 28)  147456      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 28, 28)  8192        block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 28, 28)  512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 28, 28)  512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 28, 28)  0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 28, 28)  0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 28, 28)  147456      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 28, 28)  512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 28, 28)  0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 28, 28)  147456      block_2b_relu_1[0][0]            
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 28, 28)  16384       block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 28, 28)  512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (None, 128, 28, 28)  512         block_2b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 28, 28)  0           block_2b_bn_2[0][0]              
                                                                 block_2b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 28, 28)  0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 14, 14)  294912      block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 14, 14)  1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 14, 14)  0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 14, 14)  589824      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 14, 14)  32768       block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 14, 14)  1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 14, 14)  1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 14, 14)  0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 14, 14)  0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 14, 14)  589824      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 14, 14)  1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 14, 14)  0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 14, 14)  589824      block_3b_relu_1[0][0]            
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 14, 14)  65536       block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 14, 14)  1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (None, 256, 14, 14)  1024        block_3b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 14, 14)  0           block_3b_bn_2[0][0]              
                                                                 block_3b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 14, 14)  0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 14, 14)  1179648     block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 14, 14)  2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 14, 14)  0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 14, 14)  2359296     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 14, 14)  131072      block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 14, 14)  2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 14, 14)  2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 14, 14)  0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 14, 14)  0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 14, 14)  2359296     block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 14, 14)  2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 14, 14)  0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 14, 14)  2359296     block_4b_relu_1[0][0]            
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 14, 14)  262144      block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 14, 14)  2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (None, 512, 14, 14)  2048        block_4b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 14, 14)  0           block_4b_bn_2[0][0]              
                                                                 block_4b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 14, 14)  0           add_8[0][0]                      
__________________________________________________________________________________________________
avg_pool (AveragePooling2D)     (None, 512, 1, 1)    0           block_4b_relu[0][0]              
__________________________________________________________________________________________________
flatten (Flatten)               (None, 512)          0           avg_pool[0][0]                   
__________________________________________________________________________________________________
predictions (Dense)             (None, 20)           10260       flatten[0][0]                    
==================================================================================================
Total params: 11,552,724
Trainable params: 11,531,668
Non-trainable params: 21,056
__________________________________________________________________________________________________

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py:1133: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

2024-07-05 15:15:09,531 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py:1133: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

2024-07-05 15:15:09,532 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.logging.logging 197: Log file already exists at /results/output/status.json
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py:1181: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

2024-07-05 15:15:11,577 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py:1181: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

2024-07-05 15:15:12,396 [TAO Toolkit] [INFO] root 2102: Starting Training Loop.
Epoch 1/3
2024-07-05 15:15:12,821 [TAO Toolkit] [INFO] root 2102: Error when checking target: expected predictions to have shape (20,) but got array with shape (42,)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py", line 668, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py", line 717, in return_func
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/utils.py", line 705, in return_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py", line 664, in main
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py", line 645, in main
    run_experiment(config_path=args.experiment_spec_file,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py", line 599, in run_experiment
    final_model.fit_generator(
  File "/usr/local/lib/python3.8/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1405, in fit_generator
    return training_generator.fit_generator(
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/training_generator.py", line 215, in fit_generator
    outs = model.train_on_batch(x, y,
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 1208, in train_on_batch
    x, y, sample_weights = self._standardize_user_data(
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/training.py", line 784, in _standardize_user_data
    y = standardize_input_data(
  File "/usr/local/lib/python3.8/dist-packages/keras/engine/training_utils.py", line 134, in standardize_input_data
    raise ValueError(
ValueError: Error when checking target: expected predictions to have shape (20,) but got array with shape (42,)
Execution status: FAIL
2024-07-05 17:15:16,631 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Morganh · July 6, 2024, 3:27pm

To unblock your case, please comment out pretrained_model_path as a workaround.

Morganh · July 6, 2024, 4:05pm

Please use below workaround instead.
Use docker run to login the docker.
Then, $ vim /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/scripts/train.py

Modify as below.

from keras.layers import Dense    #morgan add
from keras.models import Model  #morgan add

   if train_config.pretrained_model_path:
        # Decrypt and load pretrained model
        pretrained_model = model_io(train_config.pretrained_model_path, enc_key=key)

        #morgan add
        x= pretrained_model.layers[-2].output
        outputs = Dense(42, activation="softmax", name="predictions")(x)
        pretrained_model = Model(inputs = pretrained_model.input, outputs = outputs)
        pretrained_model.summary()
        #morgan end

        strict_mode = True
        for layer in pretrained_model.layers[1:]:

josemiad · July 8, 2024, 7:26am

Cool. How can I keep this chances to run it after with TAO command, or do you recommend to use the repo GitHub - NVIDIA/tao_tensorflow1_backend: TAO Toolkit deep learning networks with TensorFlow 1.x backend?

Morganh · July 8, 2024, 8:17am

You can use below way to generate a new docker.
$ docker run --runtime=nvidia -it --rm nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 /bin/bash

Then modify the code inside the docker.

Open a new terminal,
$ docker ps (to check the container id)
$ Run docker commit to generate a new “modified” version of docker.
i.e.,
docker commit < container_id > < your_new_docker_name >

josemiad · July 8, 2024, 8:25am

But I mean how TAO know that the docker that need to be used is the committed? Thank you!

Morganh · July 8, 2024, 8:33am

In above way, please use docker run to run training and evaluation/inference/others.
$ docker run --runtime=nvidia -it --rm tao-toolkit:5.0.0-tf1.15.5-modified /bin/bash
Then inside the docker,
# classification_tf1 train xxx
# classification_tf1 inference xxx

josemiad · July 8, 2024, 8:46am

Perfect, thank you so much!

system · July 22, 2024, 8:46am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao classification tf1 could not load optimizer with init_epoch flag TAO Toolkit	6	16	September 23, 2024
SSD Mobilenet_v2 with Kitti dataset training error TAO Toolkit	11	474	November 7, 2023
Tao Training Model Error TAO Toolkit	7	483	January 15, 2024
Detectnet2 TAO Toolkit model training fail on formating dataset on kitti format TAO Toolkit	69	947	January 22, 2024
TAO 5.0 failed to train TAO Toolkit	8	533	August 1, 2023
Detectnetv2 tfrecords error TAO Toolkit	4	415	January 13, 2024
The training process of Tao-Toolkit-API unet is always in Inf status TAO Toolkit api , tao	61	2072	June 12, 2023
Tao toolkit detectnet training kitty format error TAO Toolkit	10	412	December 8, 2023
NVIDIA TAOkit - ClearML task init failed, s3 access credentials TAO Toolkit	2	585	December 7, 2023
Error while pruning .tlt model created during efficientdet-d0 model TAO Toolkit	19	123	July 24, 2024

Change prediction layer of pretrained model TAO

Related topics