The meaning of KPI metrics of multi-class classification models

In Tao-Toolkit-API AutoML multi-class classification notebook, there is a metric of models named KPI.

What does it mean ? Does it mean classification accuracy ?

Which cell do you mention?

The cell which is used to train AutoML model.

Besides, I found that I could not evaluate the best model trained by AutoML training job.

As the second picture below, the metrics such as kpi or cur_epoch are null list or None.

How should I do to evaluate the best model ?

It is just β€œstarted”.

image

Please checked after it is completed.

Excuse me. @Morganh Dose the metric mean validation classification accuracy ? The notebook I ran is Tao-Toolkit-API AutoML multi-class classification notebook.

Excuse me. @Morganh After I ran the training phase of AutoML, I ran the following four steps to evaluate the best model.

However. When the last cell was completed, the status was still in started instead of done.

How should I do to evaluate the best model generated from AutoML ?

Here is training phase

Here is evaluation step 1

Here is evaluation step 2

Here is evaluation step 3

Here is evaluation step 4

hi @swka1043338

if you run the AutoML notebook and set the automl_algorithm to Bayesian then this KPI means the accuracy of your eval dataset; if you set to HyperBand then this KPI means the loss .

hi @swka1043338

To eval the AutoML best model, you can follow these:
Firstly, cd to your train job folder, it is usually in /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/<job id>/.
Second, create a symbol link like ln -s best_model/weights/ ./,
Third, run the 4 steps you’ve did for evaluation.

Excuse me. @Bin_Zhao_NV Dose ./ mean the directory where the notebook runs ?

Should I change ./ if I ran the notebook on directory ~/tao-getting-started_v4.0.0/notebooks/tao_api_starter_kit/api/automl ?

no, it means your <job id> folder, you can use this command ln -s /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/<job id>/best_model/weights/ /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/<job id>/

After I ran the command ln -s /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-60cbfd8d-bd6c-487d-b156-8345d46ac2c7/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/best_model/weights/ /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-60cbfd8d-bd6c-487d-b156-8345d46ac2c7/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/ on terminal, I ran the 4 steps I’ve done for evaluation.

However. The evaluation result still showed that the evalutation job_id was still in β€œSTARTED” status.

This is descrption of evaluation job id

Name:         c51a92ea-6f38-41e4-8b1b-a9e889446c6a-jk87m
Namespace:    default
Priority:     0
Node:         admin-ops01/192.168.101.8
Start Time:   Mon, 10 Apr 2023 05:39:21 +0000
Labels:       controller-uid=daf787ab-b955-4870-83d1-16b0fea99beb
              job-name=c51a92ea-6f38-41e4-8b1b-a9e889446c6a
              purpose=tao-toolkit-job
Annotations:  cni.projectcalico.org/containerID: b3ce723fca00d03d30d06b77d944e0f7fb955a0a6f0ec61655794ffad0c07d8c
              cni.projectcalico.org/podIP: 
              cni.projectcalico.org/podIPs: 
Status:       Succeeded
IP:           192.168.33.74
IPs:
  IP:           192.168.33.74
Controlled By:  Job/c51a92ea-6f38-41e4-8b1b-a9e889446c6a
Containers:
  container:
    Container ID:  containerd://52016c9269d1734b900c116ddaa0dcf73d6b0d5af5c6cae50227db30799ef7ed
    Image:         nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
    Image ID:      nvcr.io/nvidia/tao/tao-toolkit@sha256:6282b5b09220942e321a452109ad40cde47e5e490480c405c92b930fff2b0574
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
    Args:
      umask 0 && classification_tf1 evaluate --experiment_spec_file /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/specs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml --key nvidia_tlt --results_dir /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a  > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type d | xargs chmod 777; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type f | xargs chmod 666
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 10 Apr 2023 05:39:21 +0000
      Finished:     Mon, 10 Apr 2023 05:39:34 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      nvidia.com/gpu:  1
    Requests:
      nvidia.com/gpu:  1
    Environment:
      NUM_GPUS:                1
      TELEMETRY_OPT_OUT:       no
      WANDB_API_KEY:           
      CLEARML_WEB_HOST:        https://app.clear.ml
      CLEARML_API_HOST:        https://api.clear.ml
      CLEARML_FILES_HOST:      https://files.clear.ml
      CLEARML_API_ACCESS_KEY:  
      CLEARML_API_SECRET_KEY:  
    Mounts:
      /dev/shm from dshm (rw)
      /shared from shared-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qc54b (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  shared-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  tao-toolkit-api-pvc
    ReadOnly:   false
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  kube-api-access-qc54b:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  73s   default-scheduler  Successfully assigned default/c51a92ea-6f38-41e4-8b1b-a9e889446c6a-jk87m to admin-ops01
  Normal  Pulled     73s   kubelet            Container image "nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5" already present on machine
  Normal  Created    73s   kubelet            Created container container
  Normal  Started    73s   kubelet            Started container container

could you please upload the <model id> folder so that I can find more information?

The file is too big to upload to the forum.

Could I show the screenshots of the terminal ?

yes, please share the evaluation logs, which should be in /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/logs/<eval job id>.txt and the eval spec file which should be in /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/specs/<eval job id>.yaml .

Besides, I’d like to see the logs of workflow pod, you can use this command to dump the pod logs kubectl get pod tao-toolkit-api-workflow-pod-6d76b5dcf8-4vc4q

The model I trained by AutoML is a binary classification model, and I followed the dataset structure to build my custom dataset. The dataset format I set was default

This is the evaluation logs of c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt (eval_job_id.txt)

2023-04-10 05:39:22.271320: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-04-10 05:39:25.146383: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2294745000 Hz
2023-04-10 05:39:25.147052: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2a8db10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-04-10 05:39:25.147081: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2023-04-10 05:39:25.148957: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-04-10 05:39:25.254396: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.254822: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x552e810 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-10 05:39:25.254861: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla P100-SXM2-16GB, Compute Capability 6.0
2023-04-10 05:39:25.255137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.255383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: 
name: Tesla P100-SXM2-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.4805
pciBusID: 0000:00:08.0
2023-04-10 05:39:25.255428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-04-10 05:39:25.281075: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-04-10 05:39:25.284427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-04-10 05:39:25.284821: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-04-10 05:39:25.285555: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2023-04-10 05:39:25.286573: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-04-10 05:39:25.286769: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-04-10 05:39:25.286917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.287219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.287418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0
2023-04-10 05:39:25.287447: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-04-10 05:39:25.663574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-04-10 05:39:25.663622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215]      0 
2023-04-10 05:39:25.663630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0:   N 
2023-04-10 05:39:25.663904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.664222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.664462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15236 MB memory) -> physical GPU (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:00:08.0, compute capability: 6.0)
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
_init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
INFO: Starting evaluation.
INFO: Loading experiment spec at /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/specs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:245: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:245: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

INFO: Processing dataset (evaluation): /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/8f3316a0-7362-47a1-ab5d-66229ac09470/images_test
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 3, 224, 224)  0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 112, 112) 9408        input_1[0][0]                    
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 64, 112, 112) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 64, 112, 112) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D)        (None, 64, 56, 56)   36864       activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 56, 56)   256         block_1a_conv_1[0][0]            
__________________________________________________________________________________________________
block_1a_relu_1 (Activation)    (None, 64, 56, 56)   0           block_1a_bn_1[0][0]              
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D)        (None, 64, 56, 56)   36864       block_1a_relu_1[0][0]            
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 56, 56)   4096        activation_1[0][0]               
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 56, 56)   256         block_1a_conv_2[0][0]            
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 56, 56)   256         block_1a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_1 (Add)                     (None, 64, 56, 56)   0           block_1a_bn_2[0][0]              
                                                                 block_1a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1a_relu (Activation)      (None, 64, 56, 56)   0           add_1[0][0]                      
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D)        (None, 64, 56, 56)   36864       block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 56, 56)   256         block_1b_conv_1[0][0]            
__________________________________________________________________________________________________
block_1b_relu_1 (Activation)    (None, 64, 56, 56)   0           block_1b_bn_1[0][0]              
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D)        (None, 64, 56, 56)   36864       block_1b_relu_1[0][0]            
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 56, 56)   4096        block_1a_relu[0][0]              
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 56, 56)   256         block_1b_conv_2[0][0]            
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (None, 64, 56, 56)   256         block_1b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_2 (Add)                     (None, 64, 56, 56)   0           block_1b_bn_2[0][0]              
                                                                 block_1b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_1b_relu (Activation)      (None, 64, 56, 56)   0           add_2[0][0]                      
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D)        (None, 128, 28, 28)  73728       block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 28, 28)  512         block_2a_conv_1[0][0]            
__________________________________________________________________________________________________
block_2a_relu_1 (Activation)    (None, 128, 28, 28)  0           block_2a_bn_1[0][0]              
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D)        (None, 128, 28, 28)  147456      block_2a_relu_1[0][0]            
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 28, 28)  8192        block_1b_relu[0][0]              
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 28, 28)  512         block_2a_conv_2[0][0]            
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 28, 28)  512         block_2a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_3 (Add)                     (None, 128, 28, 28)  0           block_2a_bn_2[0][0]              
                                                                 block_2a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2a_relu (Activation)      (None, 128, 28, 28)  0           add_3[0][0]                      
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D)        (None, 128, 28, 28)  147456      block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 28, 28)  512         block_2b_conv_1[0][0]            
__________________________________________________________________________________________________
block_2b_relu_1 (Activation)    (None, 128, 28, 28)  0           block_2b_bn_1[0][0]              
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D)        (None, 128, 28, 28)  147456      block_2b_relu_1[0][0]            
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 28, 28)  16384       block_2a_relu[0][0]              
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 28, 28)  512         block_2b_conv_2[0][0]            
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (None, 128, 28, 28)  512         block_2b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_4 (Add)                     (None, 128, 28, 28)  0           block_2b_bn_2[0][0]              
                                                                 block_2b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_2b_relu (Activation)      (None, 128, 28, 28)  0           add_4[0][0]                      
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D)        (None, 256, 14, 14)  294912      block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 14, 14)  1024        block_3a_conv_1[0][0]            
__________________________________________________________________________________________________
block_3a_relu_1 (Activation)    (None, 256, 14, 14)  0           block_3a_bn_1[0][0]              
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D)        (None, 256, 14, 14)  589824      block_3a_relu_1[0][0]            
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 14, 14)  32768       block_2b_relu[0][0]              
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 14, 14)  1024        block_3a_conv_2[0][0]            
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 14, 14)  1024        block_3a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_5 (Add)                     (None, 256, 14, 14)  0           block_3a_bn_2[0][0]              
                                                                 block_3a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3a_relu (Activation)      (None, 256, 14, 14)  0           add_5[0][0]                      
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D)        (None, 256, 14, 14)  589824      block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 14, 14)  1024        block_3b_conv_1[0][0]            
__________________________________________________________________________________________________
block_3b_relu_1 (Activation)    (None, 256, 14, 14)  0           block_3b_bn_1[0][0]              
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D)        (None, 256, 14, 14)  589824      block_3b_relu_1[0][0]            
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 14, 14)  65536       block_3a_relu[0][0]              
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 14, 14)  1024        block_3b_conv_2[0][0]            
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (None, 256, 14, 14)  1024        block_3b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_6 (Add)                     (None, 256, 14, 14)  0           block_3b_bn_2[0][0]              
                                                                 block_3b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_3b_relu (Activation)      (None, 256, 14, 14)  0           add_6[0][0]                      
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D)        (None, 512, 14, 14)  1179648     block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 14, 14)  2048        block_4a_conv_1[0][0]            
__________________________________________________________________________________________________
block_4a_relu_1 (Activation)    (None, 512, 14, 14)  0           block_4a_bn_1[0][0]              
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D)        (None, 512, 14, 14)  2359296     block_4a_relu_1[0][0]            
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 14, 14)  131072      block_3b_relu[0][0]              
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 14, 14)  2048        block_4a_conv_2[0][0]            
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 14, 14)  2048        block_4a_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_7 (Add)                     (None, 512, 14, 14)  0           block_4a_bn_2[0][0]              
                                                                 block_4a_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4a_relu (Activation)      (None, 512, 14, 14)  0           add_7[0][0]                      
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D)        (None, 512, 14, 14)  2359296     block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 14, 14)  2048        block_4b_conv_1[0][0]            
__________________________________________________________________________________________________
block_4b_relu_1 (Activation)    (None, 512, 14, 14)  0           block_4b_bn_1[0][0]              
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D)        (None, 512, 14, 14)  2359296     block_4b_relu_1[0][0]            
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 14, 14)  262144      block_4a_relu[0][0]              
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 14, 14)  2048        block_4b_conv_2[0][0]            
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (None, 512, 14, 14)  2048        block_4b_conv_shortcut[0][0]     
__________________________________________________________________________________________________
add_8 (Add)                     (None, 512, 14, 14)  0           block_4b_bn_2[0][0]              
                                                                 block_4b_bn_shortcut[0][0]       
__________________________________________________________________________________________________
block_4b_relu (Activation)      (None, 512, 14, 14)  0           add_8[0][0]                      
__________________________________________________________________________________________________
avg_pool (AveragePooling2D)     (None, 512, 1, 1)    0           block_4b_relu[0][0]              
__________________________________________________________________________________________________
flatten (Flatten)               (None, 512)          0           avg_pool[0][0]                   
__________________________________________________________________________________________________
predictions (Dense)             (None, 176)          90288       flatten[0][0]                    
==================================================================================================
Total params: 11,632,752
Trainable params: 11,621,104
Non-trainable params: 11,648
__________________________________________________________________________________________________
Found 266 images belonging to 2 classes.
Traceback (most recent call last):
  File "</usr/local/lib/python3.6/dist-packages/iva/makenet/scripts/evaluate.py>", line 3, in <module>
  File "<frozen iva.makenet.scripts.evaluate>", line 308, in <module>
  File "<frozen iva.common.utils>", line 707, in return_func
  File "<frozen iva.common.utils>", line 695, in return_func
  File "<frozen iva.makenet.scripts.evaluate>", line 304, in main
  File "<frozen iva.makenet.scripts.evaluate>", line 255, in run_evaluate
AssertionError: The number of classes of the loaded model doesn't match the          number of classes in the evaluation dataset.
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL

EOF

This is the eval spec file

train_config {
  optimizer {
    sgd {
      lr: 0.01
      decay: 0.0
      momentum: 0.9
      nesterov: False
    }
  }
  batch_size_per_gpu: 256
  n_epochs: 80
  n_workers: 2
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 5e-05
  }
  lr_config {
    soft_anneal {
      learning_rate: 0.05
      soft_start: 0.056
      annealing_divider: 10.0
      annealing_points: 0.3
      annealing_points: 0.6
      annealing_points: 0.8
    }
  }
  random_seed: 42
  preprocess_mode: "caffe"
  train_dataset_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/6732c051-53df-4416-b325-df341ae4e38c/images_train"
  val_dataset_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/75fd1ba2-0b07-4494-91ed-8df7a6de9761/images_val"
}
eval_config {
  top_k: 3
  batch_size: 256
  n_workers: 2
  enable_center_crop: True
  model_path: "/shared/users/00000000-0000-0000-0000-000000000000/models/daf291d5-00a9-41f3-9ec0-d6b16ee445ed/pretrained_classification_vresnet18/resnet_18.hdf5"
  eval_dataset_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/8f3316a0-7362-47a1-ab5d-66229ac09470/images_test"
}
model_config {
  arch: "resnet"
  input_image_size: "3,224,224"
  n_layers: 18
  retain_head: False
  use_batch_norm: True
  all_projections: True
  dropout: 0.001
}

This is the logs of workflow pod

NGC CLI 3.10.0
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_0.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_0 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_0/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_0/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_1.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_1 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_1/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_1/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_2.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_2 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_2/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_2/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_3.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_3 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_3/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_3/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_4.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_4 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_4/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_4/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_5.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_5 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_5/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_5/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_6.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_6 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_6/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_6/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_7.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_7 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_7/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_7/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_0.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_0 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_0/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_0/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_1.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_1 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_1/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_1/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_2.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_2 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_2/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_2/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_3.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_3 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_3/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_3/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_4.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_4 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_4/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_4/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_5.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_5 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_5/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_5/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_6.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_6 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_6/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_6/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_7.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_7 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_7/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_7/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_8.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_8 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_8/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_8/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_9.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_9 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_9/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_9/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_10.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_10 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_10/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_10/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_11.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_11 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_11/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_11/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_12.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_12 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_12/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_12/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_13.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_13 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_13/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_13/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_14.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_14 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_14/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_14/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_15.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_15 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_15/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_15/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_16.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_16 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_16/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_16/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_17.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_17 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_17/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_17/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_18.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_18 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_18/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_18/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS  -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_19.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_19 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_19/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_19/log.txt
AutoML pipeline done
Loaded specs
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
Loaded dataset
classification_tf1 evaluate --experiment_spec_file /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/specs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml --key nvidia_tlt --results_dir /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a  > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type d | xargs chmod 777; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type f | xargs chmod 666 /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a/status.json
nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
Job created
Post running
Job Done: c51a92ea-6f38-41e4-8b1b-a9e889446c6a Final status: Done

could you please share the folder structure of your <model id> using command tree <model id> ?

b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4
β”œβ”€β”€ 0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e
β”‚   β”œβ”€β”€ automl_metadata.json
β”‚   β”œβ”€β”€ best_model
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ controller.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681101031.95d68c1e-f04d-4743-8d26-29f9b77711a7-2gs2t
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ recommendation_5.kitti
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”‚       └── resnet_080.tlt
β”‚   β”œβ”€β”€ brain.json
β”‚   β”œβ”€β”€ controller.json
β”‚   β”œβ”€β”€ controller.log
β”‚   β”œβ”€β”€ current_rec.json
β”‚   β”œβ”€β”€ experiment_0
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681099901.52349577-5f28-437b-910d-e335b34e5acd-jsqss
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_1
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681100127.4e6009f2-a270-4852-9d98-f71d1c5bdf50-w4jf2
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_10
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681102163.426cb298-2b40-4961-9e19-4e484ced1d22-pr8vg
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_11
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681102391.68abf403-66a0-4bf0-913d-386f1e17fc25-6njzk
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_12
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681102618.5582d2c0-a46d-4f3b-8624-62b2719ed4a3-nsr4w
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_13
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681102845.18bc3e39-284c-476d-ae65-5035a4d2f9d2-q4skh
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_14
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681103072.6d4b14a0-d0b4-40ba-acd6-54b754eb17d9-46fb6
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_15
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681103299.ee82b78b-e364-4ae1-92a0-27823682f2b8-js8zt
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_16
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681103526.8520792d-dc9c-40be-8382-dfa18bd5868a-xttfn
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_17
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681103752.e9f4ad9a-b2b1-4ed6-84c4-eb2646f93da1-mnxns
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_18
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681103978.90cd6118-39b5-4414-a485-a65986427ddd-2pk6h
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_19
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681104203.bac5f9ea-ac83-4ab8-bb00-7d9178a3f00a-n78ws
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_2
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681100353.61c80a64-dba4-4ea6-a8a6-c40ccd97a8f1-6nnp9
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_3
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681100579.fabb7b4a-8db9-4e80-88b1-7621339bb990-ctgqm
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_4
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681100805.9998aa9a-4d4c-4c4e-a600-d21b174f3f2e-jkdbk
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_5
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681101031.95d68c1e-f04d-4743-8d26-29f9b77711a7-2gs2t
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_6
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681101258.7073ba23-62eb-4210-9392-60e47476a8da-65s4k
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_7
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681101485.d9fb78b2-69d7-4f8c-9ec7-6d72b2d09bf7-58bv5
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_8
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681101711.923f6ca2-8c5d-410f-bcf2-2bc70e466883-d8fz2
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ experiment_9
β”‚   β”‚   β”œβ”€β”€ classmap.json
β”‚   β”‚   β”œβ”€β”€ events
β”‚   β”‚   β”‚   └── events.out.tfevents.1681101937.38258963-8fe8-4247-b3f8-f62660c657f6-ggn4r
β”‚   β”‚   β”œβ”€β”€ log.txt
β”‚   β”‚   β”œβ”€β”€ primary_checkpoint_name.json
β”‚   β”‚   β”œβ”€β”€ status.json
β”‚   β”‚   β”œβ”€β”€ training.csv
β”‚   β”‚   └── weights
β”‚   β”œβ”€β”€ recommendation_0.kitti
β”‚   β”œβ”€β”€ recommendation_1.kitti
β”‚   β”œβ”€β”€ recommendation_10.kitti
β”‚   β”œβ”€β”€ recommendation_11.kitti
β”‚   β”œβ”€β”€ recommendation_12.kitti
β”‚   β”œβ”€β”€ recommendation_13.kitti
β”‚   β”œβ”€β”€ recommendation_14.kitti
β”‚   β”œβ”€β”€ recommendation_15.kitti
β”‚   β”œβ”€β”€ recommendation_16.kitti
β”‚   β”œβ”€β”€ recommendation_17.kitti
β”‚   β”œβ”€β”€ recommendation_18.kitti
β”‚   β”œβ”€β”€ recommendation_19.kitti
β”‚   β”œβ”€β”€ recommendation_2.kitti
β”‚   β”œβ”€β”€ recommendation_3.kitti
β”‚   β”œβ”€β”€ recommendation_4.kitti
β”‚   β”œβ”€β”€ recommendation_5.kitti
β”‚   β”œβ”€β”€ recommendation_6.kitti
β”‚   β”œβ”€β”€ recommendation_7.kitti
β”‚   β”œβ”€β”€ recommendation_8.kitti
β”‚   β”œβ”€β”€ recommendation_9.kitti
β”‚   └── weights -> best_model/weights/
β”œβ”€β”€ 0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e.tar.gz
β”œβ”€β”€ c51a92ea-6f38-41e4-8b1b-a9e889446c6a
β”‚   └── status.json
β”œβ”€β”€ jobs.yaml
β”œβ”€β”€ jobs_metadata
β”‚   β”œβ”€β”€ 0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e.json
β”‚   β”œβ”€β”€ a3f610d5-8c9a-4187-8590-3b22783e568a.json
β”‚   └── c51a92ea-6f38-41e4-8b1b-a9e889446c6a.json
β”œβ”€β”€ logs
β”‚   └── c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt
β”œβ”€β”€ metadata.json
└── specs
    β”œβ”€β”€ c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml
    β”œβ”€β”€ evaluate.json
    └── train.json

69 directories, 165 files

Excuse me. @Bin_Zhao_NV I found that the neuron number of the predictions layer of my classification model trained by AutoML is not 2.

However. The class number of the dataset which I uploaded was 2 and I followed the steps in the multi-classification notebook to train the model.

Why the prediction layers changed from 2 into 176 ?

Hi @swka1043338 ,

the neuron number of the predictions layer is different with your class number because this eval process didn’t find the right wights. so please do this steps:

  1. cd to your <job_id> folder, such as /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-<pvc_id>/users/<usr_id>/models/<model_id>/<job_id>/
  2. create symbol link using relative path, such as ln -s best_model/weights/
  3. after step 2, your <job_id> folder should be like:
total 56
drwxrwxrwx  6 www-data     www-data     4096 Apr 11 06:50 ./
drwxrwxrwx 10 www-data     www-data     4096 Apr 11 06:51 ../
-rw-rw-rw-  1 nobody       nogroup       238 Apr 11 06:07 automl_metadata.json
drwxrwxrwx  4 nobody       nogroup      4096 Apr 11 06:07 best_model/
-rw-rw-rw-  1 nobody       nogroup       639 Apr 11 06:06 brain.json
-rw-rw-rw-  1 nobody       nogroup      1504 Apr 11 06:07 controller.json
-rw-rw-rw-  1 nobody       nogroup       296 Apr 11 06:07 controller.log
-rw-rw-rw-  1 nobody       nogroup         1 Apr 11 06:06 current_rec.json
drwxrwxrwx  4 nobody       nogroup      4096 Apr 11 06:05 experiment_0/
drwxrwxrwx  4 nobody       nogroup      4096 Apr 11 06:06 experiment_1/
drwxrwxrwx  4 nobody       nogroup      4096 Apr 11 06:07 experiment_2/
-rw-rw-rw-  1 nobody       nogroup      1465 Apr 11 06:04 recommendation_0.kitti
-rw-rw-rw-  1 nobody       nogroup      1464 Apr 11 06:05 recommendation_1.kitti
-rw-rw-rw-  1 nobody       nogroup      1465 Apr 11 06:06 recommendation_2.kitti
lrwxrwxrwx  1 local-bizhao local-bizhao   19 Apr 11 06:50 weights -> best_model/weights//
  1. run the eval cells and then you might see that in /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-<pvc_id>/users/<usr_id>/models/<model_id>/specs/<eval_job_id>.yaml, the config model_path is your training weights rather than the default PTM.
1 Like