In Tao-Toolkit-API AutoML multi-class classification notebook, there is a metric of models named KPI.
What does it mean ? Does it mean classification accuracy ?
In Tao-Toolkit-API AutoML multi-class classification notebook, there is a metric of models named KPI.
What does it mean ? Does it mean classification accuracy ?
Which cell do you mention?
Besides, I found that I could not evaluate the best model trained by AutoML training job.
As the second picture below, the metrics such as kpi or cur_epoch are null list or None.
How should I do to evaluate the best model ?
It is just βstartedβ.
Please checked after it is completed.
Excuse me. @Morganh Dose the metric mean validation classification accuracy ? The notebook I ran is Tao-Toolkit-API AutoML multi-class classification notebook.
Excuse me. @Morganh After I ran the training phase of AutoML, I ran the following four steps to evaluate the best model.
However. When the last cell was completed, the status was still in started instead of done.
How should I do to evaluate the best model generated from AutoML ?
Here is training phase
Here is evaluation step 1
Here is evaluation step 2
Here is evaluation step 3
Here is evaluation step 4
hi @swka1043338
if you run the AutoML notebook and set the automl_algorithm
to Bayesian
then this KPI means the accuracy of your eval dataset; if you set to HyperBand
then this KPI means the loss .
hi @swka1043338
To eval the AutoML best model, you can follow these:
Firstly, cd to your train job folder, it is usually in /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/<job id>/
.
Second, create a symbol link like ln -s best_model/weights/ ./
,
Third, run the 4 steps youβve did for evaluation.
Excuse me. @Bin_Zhao_NV Dose ./
mean the directory where the notebook runs ?
Should I change ./
if I ran the notebook on directory ~/tao-getting-started_v4.0.0/notebooks/tao_api_starter_kit/api/automl
?
no, it means your <job id>
folder, you can use this command ln -s /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/<job id>/best_model/weights/ /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/<job id>/
After I ran the command ln -s /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-60cbfd8d-bd6c-487d-b156-8345d46ac2c7/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/best_model/weights/ /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-60cbfd8d-bd6c-487d-b156-8345d46ac2c7/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/
on terminal, I ran the 4 steps Iβve done for evaluation.
However. The evaluation result still showed that the evalutation job_id was still in βSTARTEDβ status.
This is descrption of evaluation job id
Name: c51a92ea-6f38-41e4-8b1b-a9e889446c6a-jk87m
Namespace: default
Priority: 0
Node: admin-ops01/192.168.101.8
Start Time: Mon, 10 Apr 2023 05:39:21 +0000
Labels: controller-uid=daf787ab-b955-4870-83d1-16b0fea99beb
job-name=c51a92ea-6f38-41e4-8b1b-a9e889446c6a
purpose=tao-toolkit-job
Annotations: cni.projectcalico.org/containerID: b3ce723fca00d03d30d06b77d944e0f7fb955a0a6f0ec61655794ffad0c07d8c
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Succeeded
IP: 192.168.33.74
IPs:
IP: 192.168.33.74
Controlled By: Job/c51a92ea-6f38-41e4-8b1b-a9e889446c6a
Containers:
container:
Container ID: containerd://52016c9269d1734b900c116ddaa0dcf73d6b0d5af5c6cae50227db30799ef7ed
Image: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
Image ID: nvcr.io/nvidia/tao/tao-toolkit@sha256:6282b5b09220942e321a452109ad40cde47e5e490480c405c92b930fff2b0574
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
umask 0 && classification_tf1 evaluate --experiment_spec_file /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/specs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml --key nvidia_tlt --results_dir /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type d | xargs chmod 777; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type f | xargs chmod 666
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 10 Apr 2023 05:39:21 +0000
Finished: Mon, 10 Apr 2023 05:39:34 +0000
Ready: False
Restart Count: 0
Limits:
nvidia.com/gpu: 1
Requests:
nvidia.com/gpu: 1
Environment:
NUM_GPUS: 1
TELEMETRY_OPT_OUT: no
WANDB_API_KEY:
CLEARML_WEB_HOST: https://app.clear.ml
CLEARML_API_HOST: https://api.clear.ml
CLEARML_FILES_HOST: https://files.clear.ml
CLEARML_API_ACCESS_KEY:
CLEARML_API_SECRET_KEY:
Mounts:
/dev/shm from dshm (rw)
/shared from shared-data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qc54b (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
shared-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: tao-toolkit-api-pvc
ReadOnly: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
kube-api-access-qc54b:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 73s default-scheduler Successfully assigned default/c51a92ea-6f38-41e4-8b1b-a9e889446c6a-jk87m to admin-ops01
Normal Pulled 73s kubelet Container image "nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5" already present on machine
Normal Created 73s kubelet Created container container
Normal Started 73s kubelet Started container container
could you please upload the <model id>
folder so that I can find more information?
The file is too big to upload to the forum.
Could I show the screenshots of the terminal ?
yes, please share the evaluation logs, which should be in /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/logs/<eval job id>.txt
and the eval spec file which should be in /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-0e6322b2-5611-435b-9dd6-2f96b8885e8d/users/<your id>/models/<model id>/specs/<eval job id>.yaml
.
Besides, Iβd like to see the logs of workflow pod, you can use this command to dump the pod logs kubectl get pod tao-toolkit-api-workflow-pod-6d76b5dcf8-4vc4q
The model I trained by AutoML is a binary classification model, and I followed the dataset structure to build my custom dataset. The dataset format I set was default
This is the evaluation logs of c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt (eval_job_id.txt)
2023-04-10 05:39:22.271320: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
2023-04-10 05:39:25.146383: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2294745000 Hz
2023-04-10 05:39:25.147052: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2a8db10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-04-10 05:39:25.147081: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2023-04-10 05:39:25.148957: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-04-10 05:39:25.254396: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.254822: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x552e810 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-10 05:39:25.254861: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla P100-SXM2-16GB, Compute Capability 6.0
2023-04-10 05:39:25.255137: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.255383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties:
name: Tesla P100-SXM2-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.4805
pciBusID: 0000:00:08.0
2023-04-10 05:39:25.255428: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-04-10 05:39:25.281075: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2023-04-10 05:39:25.284427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2023-04-10 05:39:25.284821: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2023-04-10 05:39:25.285555: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2023-04-10 05:39:25.286573: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2023-04-10 05:39:25.286769: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2023-04-10 05:39:25.286917: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.287219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.287418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0
2023-04-10 05:39:25.287447: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-04-10 05:39:25.663574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-04-10 05:39:25.663622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0
2023-04-10 05:39:25.663630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N
2023-04-10 05:39:25.663904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.664222: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-04-10 05:39:25.664462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15236 MB memory) -> physical GPU (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:00:08.0, compute capability: 6.0)
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
_init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
INFO: Starting evaluation.
INFO: Loading experiment spec at /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/specs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:245: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:245: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:133: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/third_party/keras/tensorflow_backend.py:187: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.
WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.
INFO: Processing dataset (evaluation): /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/8f3316a0-7362-47a1-ab5d-66229ac09470/images_test
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 3, 224, 224) 0
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 64, 112, 112) 9408 input_1[0][0]
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 64, 112, 112) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation) (None, 64, 112, 112) 0 bn_conv1[0][0]
__________________________________________________________________________________________________
block_1a_conv_1 (Conv2D) (None, 64, 56, 56) 36864 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_1 (BatchNormalizati (None, 64, 56, 56) 256 block_1a_conv_1[0][0]
__________________________________________________________________________________________________
block_1a_relu_1 (Activation) (None, 64, 56, 56) 0 block_1a_bn_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_2 (Conv2D) (None, 64, 56, 56) 36864 block_1a_relu_1[0][0]
__________________________________________________________________________________________________
block_1a_conv_shortcut (Conv2D) (None, 64, 56, 56) 4096 activation_1[0][0]
__________________________________________________________________________________________________
block_1a_bn_2 (BatchNormalizati (None, 64, 56, 56) 256 block_1a_conv_2[0][0]
__________________________________________________________________________________________________
block_1a_bn_shortcut (BatchNorm (None, 64, 56, 56) 256 block_1a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_1 (Add) (None, 64, 56, 56) 0 block_1a_bn_2[0][0]
block_1a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1a_relu (Activation) (None, 64, 56, 56) 0 add_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_1 (Conv2D) (None, 64, 56, 56) 36864 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_1 (BatchNormalizati (None, 64, 56, 56) 256 block_1b_conv_1[0][0]
__________________________________________________________________________________________________
block_1b_relu_1 (Activation) (None, 64, 56, 56) 0 block_1b_bn_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_2 (Conv2D) (None, 64, 56, 56) 36864 block_1b_relu_1[0][0]
__________________________________________________________________________________________________
block_1b_conv_shortcut (Conv2D) (None, 64, 56, 56) 4096 block_1a_relu[0][0]
__________________________________________________________________________________________________
block_1b_bn_2 (BatchNormalizati (None, 64, 56, 56) 256 block_1b_conv_2[0][0]
__________________________________________________________________________________________________
block_1b_bn_shortcut (BatchNorm (None, 64, 56, 56) 256 block_1b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_2 (Add) (None, 64, 56, 56) 0 block_1b_bn_2[0][0]
block_1b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_1b_relu (Activation) (None, 64, 56, 56) 0 add_2[0][0]
__________________________________________________________________________________________________
block_2a_conv_1 (Conv2D) (None, 128, 28, 28) 73728 block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_1 (BatchNormalizati (None, 128, 28, 28) 512 block_2a_conv_1[0][0]
__________________________________________________________________________________________________
block_2a_relu_1 (Activation) (None, 128, 28, 28) 0 block_2a_bn_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_2 (Conv2D) (None, 128, 28, 28) 147456 block_2a_relu_1[0][0]
__________________________________________________________________________________________________
block_2a_conv_shortcut (Conv2D) (None, 128, 28, 28) 8192 block_1b_relu[0][0]
__________________________________________________________________________________________________
block_2a_bn_2 (BatchNormalizati (None, 128, 28, 28) 512 block_2a_conv_2[0][0]
__________________________________________________________________________________________________
block_2a_bn_shortcut (BatchNorm (None, 128, 28, 28) 512 block_2a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_3 (Add) (None, 128, 28, 28) 0 block_2a_bn_2[0][0]
block_2a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2a_relu (Activation) (None, 128, 28, 28) 0 add_3[0][0]
__________________________________________________________________________________________________
block_2b_conv_1 (Conv2D) (None, 128, 28, 28) 147456 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_1 (BatchNormalizati (None, 128, 28, 28) 512 block_2b_conv_1[0][0]
__________________________________________________________________________________________________
block_2b_relu_1 (Activation) (None, 128, 28, 28) 0 block_2b_bn_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_2 (Conv2D) (None, 128, 28, 28) 147456 block_2b_relu_1[0][0]
__________________________________________________________________________________________________
block_2b_conv_shortcut (Conv2D) (None, 128, 28, 28) 16384 block_2a_relu[0][0]
__________________________________________________________________________________________________
block_2b_bn_2 (BatchNormalizati (None, 128, 28, 28) 512 block_2b_conv_2[0][0]
__________________________________________________________________________________________________
block_2b_bn_shortcut (BatchNorm (None, 128, 28, 28) 512 block_2b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_4 (Add) (None, 128, 28, 28) 0 block_2b_bn_2[0][0]
block_2b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_2b_relu (Activation) (None, 128, 28, 28) 0 add_4[0][0]
__________________________________________________________________________________________________
block_3a_conv_1 (Conv2D) (None, 256, 14, 14) 294912 block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_1 (BatchNormalizati (None, 256, 14, 14) 1024 block_3a_conv_1[0][0]
__________________________________________________________________________________________________
block_3a_relu_1 (Activation) (None, 256, 14, 14) 0 block_3a_bn_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_2 (Conv2D) (None, 256, 14, 14) 589824 block_3a_relu_1[0][0]
__________________________________________________________________________________________________
block_3a_conv_shortcut (Conv2D) (None, 256, 14, 14) 32768 block_2b_relu[0][0]
__________________________________________________________________________________________________
block_3a_bn_2 (BatchNormalizati (None, 256, 14, 14) 1024 block_3a_conv_2[0][0]
__________________________________________________________________________________________________
block_3a_bn_shortcut (BatchNorm (None, 256, 14, 14) 1024 block_3a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_5 (Add) (None, 256, 14, 14) 0 block_3a_bn_2[0][0]
block_3a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3a_relu (Activation) (None, 256, 14, 14) 0 add_5[0][0]
__________________________________________________________________________________________________
block_3b_conv_1 (Conv2D) (None, 256, 14, 14) 589824 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_1 (BatchNormalizati (None, 256, 14, 14) 1024 block_3b_conv_1[0][0]
__________________________________________________________________________________________________
block_3b_relu_1 (Activation) (None, 256, 14, 14) 0 block_3b_bn_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_2 (Conv2D) (None, 256, 14, 14) 589824 block_3b_relu_1[0][0]
__________________________________________________________________________________________________
block_3b_conv_shortcut (Conv2D) (None, 256, 14, 14) 65536 block_3a_relu[0][0]
__________________________________________________________________________________________________
block_3b_bn_2 (BatchNormalizati (None, 256, 14, 14) 1024 block_3b_conv_2[0][0]
__________________________________________________________________________________________________
block_3b_bn_shortcut (BatchNorm (None, 256, 14, 14) 1024 block_3b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_6 (Add) (None, 256, 14, 14) 0 block_3b_bn_2[0][0]
block_3b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_3b_relu (Activation) (None, 256, 14, 14) 0 add_6[0][0]
__________________________________________________________________________________________________
block_4a_conv_1 (Conv2D) (None, 512, 14, 14) 1179648 block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_1 (BatchNormalizati (None, 512, 14, 14) 2048 block_4a_conv_1[0][0]
__________________________________________________________________________________________________
block_4a_relu_1 (Activation) (None, 512, 14, 14) 0 block_4a_bn_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_2 (Conv2D) (None, 512, 14, 14) 2359296 block_4a_relu_1[0][0]
__________________________________________________________________________________________________
block_4a_conv_shortcut (Conv2D) (None, 512, 14, 14) 131072 block_3b_relu[0][0]
__________________________________________________________________________________________________
block_4a_bn_2 (BatchNormalizati (None, 512, 14, 14) 2048 block_4a_conv_2[0][0]
__________________________________________________________________________________________________
block_4a_bn_shortcut (BatchNorm (None, 512, 14, 14) 2048 block_4a_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_7 (Add) (None, 512, 14, 14) 0 block_4a_bn_2[0][0]
block_4a_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4a_relu (Activation) (None, 512, 14, 14) 0 add_7[0][0]
__________________________________________________________________________________________________
block_4b_conv_1 (Conv2D) (None, 512, 14, 14) 2359296 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_1 (BatchNormalizati (None, 512, 14, 14) 2048 block_4b_conv_1[0][0]
__________________________________________________________________________________________________
block_4b_relu_1 (Activation) (None, 512, 14, 14) 0 block_4b_bn_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_2 (Conv2D) (None, 512, 14, 14) 2359296 block_4b_relu_1[0][0]
__________________________________________________________________________________________________
block_4b_conv_shortcut (Conv2D) (None, 512, 14, 14) 262144 block_4a_relu[0][0]
__________________________________________________________________________________________________
block_4b_bn_2 (BatchNormalizati (None, 512, 14, 14) 2048 block_4b_conv_2[0][0]
__________________________________________________________________________________________________
block_4b_bn_shortcut (BatchNorm (None, 512, 14, 14) 2048 block_4b_conv_shortcut[0][0]
__________________________________________________________________________________________________
add_8 (Add) (None, 512, 14, 14) 0 block_4b_bn_2[0][0]
block_4b_bn_shortcut[0][0]
__________________________________________________________________________________________________
block_4b_relu (Activation) (None, 512, 14, 14) 0 add_8[0][0]
__________________________________________________________________________________________________
avg_pool (AveragePooling2D) (None, 512, 1, 1) 0 block_4b_relu[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 512) 0 avg_pool[0][0]
__________________________________________________________________________________________________
predictions (Dense) (None, 176) 90288 flatten[0][0]
==================================================================================================
Total params: 11,632,752
Trainable params: 11,621,104
Non-trainable params: 11,648
__________________________________________________________________________________________________
Found 266 images belonging to 2 classes.
Traceback (most recent call last):
File "</usr/local/lib/python3.6/dist-packages/iva/makenet/scripts/evaluate.py>", line 3, in <module>
File "<frozen iva.makenet.scripts.evaluate>", line 308, in <module>
File "<frozen iva.common.utils>", line 707, in return_func
File "<frozen iva.common.utils>", line 695, in return_func
File "<frozen iva.makenet.scripts.evaluate>", line 304, in main
File "<frozen iva.makenet.scripts.evaluate>", line 255, in run_evaluate
AssertionError: The number of classes of the loaded model doesn't match the number of classes in the evaluation dataset.
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
EOF
This is the eval spec file
train_config {
optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 256
n_epochs: 80
n_workers: 2
reg_config {
type: "L2"
scope: "Conv2D,Dense"
weight_decay: 5e-05
}
lr_config {
soft_anneal {
learning_rate: 0.05
soft_start: 0.056
annealing_divider: 10.0
annealing_points: 0.3
annealing_points: 0.6
annealing_points: 0.8
}
}
random_seed: 42
preprocess_mode: "caffe"
train_dataset_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/6732c051-53df-4416-b325-df341ae4e38c/images_train"
val_dataset_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/75fd1ba2-0b07-4494-91ed-8df7a6de9761/images_val"
}
eval_config {
top_k: 3
batch_size: 256
n_workers: 2
enable_center_crop: True
model_path: "/shared/users/00000000-0000-0000-0000-000000000000/models/daf291d5-00a9-41f3-9ec0-d6b16ee445ed/pretrained_classification_vresnet18/resnet_18.hdf5"
eval_dataset_path: "/shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/datasets/8f3316a0-7362-47a1-ab5d-66229ac09470/images_test"
}
model_config {
arch: "resnet"
input_image_size: "3,224,224"
n_layers: 18
retain_head: False
use_batch_norm: True
all_projections: True
dropout: 0.001
}
This is the logs of workflow pod
NGC CLI 3.10.0
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_0.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_0 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_0/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_0/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_1.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_1 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_1/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_1/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_2.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_2 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_2/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_2/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_3.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_3 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_3/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_3/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_4.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_4 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_4/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_4/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_5.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_5 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_5/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_5/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_6.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_6 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_6/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_6/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/recommendation_7.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_7 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_7/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/a3853f13-0f97-47fd-b3ac-03e6278bce9a/ffb2c7b5-d505-4a3c-8dc5-0f0358cdec31/experiment_7/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_0.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_0 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_0/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_0/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_1.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_1 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_1/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_1/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_2.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_2 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_2/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_2/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_3.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_3 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_3/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_3/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_4.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_4 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_4/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_4/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_5.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_5 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_5/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_5/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_6.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_6 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_6/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_6/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_7.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_7 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_7/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_7/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_8.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_8 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_8/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_8/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_9.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_9 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_9/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_9/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_10.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_10 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_10/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_10/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_11.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_11 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_11/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_11/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_12.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_12 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_12/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_12/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_13.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_13 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_13/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_13/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_14.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_14 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_14/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_14/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_15.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_15 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_15/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_15/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_16.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_16 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_16/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_16/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_17.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_17 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_17/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_17/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_18.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_18 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_18/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_18/log.txt
AutoML pipeline done
AutoML pipeline
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
classification_tf1 train --gpus $NUM_GPUS -e /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/recommendation_19.kitti -r /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_19 -k nvidia_tlt > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_19/log.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e/experiment_19/log.txt
AutoML pipeline done
Loaded specs
Warning: Classification supports only one train dataset
Warning: Train, eval datasets are both required to run Classification actions - train, evaluate, retrain, inference
Loaded dataset
classification_tf1 evaluate --experiment_spec_file /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/specs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml --key nvidia_tlt --results_dir /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a > /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt 2>&1 >> /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/logs/c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type d | xargs chmod 777; find /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a -type f | xargs chmod 666 /shared/users/80ab3db1-baf9-5608-8a94-f5b86a8cbd59/models/b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4/c51a92ea-6f38-41e4-8b1b-a9e889446c6a/status.json
nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
Job created
Post running
Job Done: c51a92ea-6f38-41e4-8b1b-a9e889446c6a Final status: Done
could you please share the folder structure of your <model id>
using command tree <model id>
?
b2ab9987-7c68-4d0b-aede-36cf3a6b7bd4
βββ 0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e
β βββ automl_metadata.json
β βββ best_model
β β βββ classmap.json
β β βββ controller.json
β β βββ events
β β β βββ events.out.tfevents.1681101031.95d68c1e-f04d-4743-8d26-29f9b77711a7-2gs2t
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ recommendation_5.kitti
β β βββ status.json
β β βββ training.csv
β β βββ weights
β β βββ resnet_080.tlt
β βββ brain.json
β βββ controller.json
β βββ controller.log
β βββ current_rec.json
β βββ experiment_0
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681099901.52349577-5f28-437b-910d-e335b34e5acd-jsqss
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_1
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681100127.4e6009f2-a270-4852-9d98-f71d1c5bdf50-w4jf2
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_10
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681102163.426cb298-2b40-4961-9e19-4e484ced1d22-pr8vg
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_11
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681102391.68abf403-66a0-4bf0-913d-386f1e17fc25-6njzk
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_12
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681102618.5582d2c0-a46d-4f3b-8624-62b2719ed4a3-nsr4w
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_13
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681102845.18bc3e39-284c-476d-ae65-5035a4d2f9d2-q4skh
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_14
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681103072.6d4b14a0-d0b4-40ba-acd6-54b754eb17d9-46fb6
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_15
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681103299.ee82b78b-e364-4ae1-92a0-27823682f2b8-js8zt
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_16
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681103526.8520792d-dc9c-40be-8382-dfa18bd5868a-xttfn
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_17
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681103752.e9f4ad9a-b2b1-4ed6-84c4-eb2646f93da1-mnxns
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_18
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681103978.90cd6118-39b5-4414-a485-a65986427ddd-2pk6h
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_19
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681104203.bac5f9ea-ac83-4ab8-bb00-7d9178a3f00a-n78ws
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_2
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681100353.61c80a64-dba4-4ea6-a8a6-c40ccd97a8f1-6nnp9
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_3
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681100579.fabb7b4a-8db9-4e80-88b1-7621339bb990-ctgqm
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_4
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681100805.9998aa9a-4d4c-4c4e-a600-d21b174f3f2e-jkdbk
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_5
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681101031.95d68c1e-f04d-4743-8d26-29f9b77711a7-2gs2t
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_6
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681101258.7073ba23-62eb-4210-9392-60e47476a8da-65s4k
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_7
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681101485.d9fb78b2-69d7-4f8c-9ec7-6d72b2d09bf7-58bv5
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_8
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681101711.923f6ca2-8c5d-410f-bcf2-2bc70e466883-d8fz2
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ experiment_9
β β βββ classmap.json
β β βββ events
β β β βββ events.out.tfevents.1681101937.38258963-8fe8-4247-b3f8-f62660c657f6-ggn4r
β β βββ log.txt
β β βββ primary_checkpoint_name.json
β β βββ status.json
β β βββ training.csv
β β βββ weights
β βββ recommendation_0.kitti
β βββ recommendation_1.kitti
β βββ recommendation_10.kitti
β βββ recommendation_11.kitti
β βββ recommendation_12.kitti
β βββ recommendation_13.kitti
β βββ recommendation_14.kitti
β βββ recommendation_15.kitti
β βββ recommendation_16.kitti
β βββ recommendation_17.kitti
β βββ recommendation_18.kitti
β βββ recommendation_19.kitti
β βββ recommendation_2.kitti
β βββ recommendation_3.kitti
β βββ recommendation_4.kitti
β βββ recommendation_5.kitti
β βββ recommendation_6.kitti
β βββ recommendation_7.kitti
β βββ recommendation_8.kitti
β βββ recommendation_9.kitti
β βββ weights -> best_model/weights/
βββ 0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e.tar.gz
βββ c51a92ea-6f38-41e4-8b1b-a9e889446c6a
β βββ status.json
βββ jobs.yaml
βββ jobs_metadata
β βββ 0d9e13a8-14fd-4e8c-b9ae-b65f75eac77e.json
β βββ a3f610d5-8c9a-4187-8590-3b22783e568a.json
β βββ c51a92ea-6f38-41e4-8b1b-a9e889446c6a.json
βββ logs
β βββ c51a92ea-6f38-41e4-8b1b-a9e889446c6a.txt
βββ metadata.json
βββ specs
βββ c51a92ea-6f38-41e4-8b1b-a9e889446c6a.yaml
βββ evaluate.json
βββ train.json
69 directories, 165 files
Excuse me. @Bin_Zhao_NV I found that the neuron number of the predictions layer of my classification model trained by AutoML is not 2.
However. The class number of the dataset which I uploaded was 2 and I followed the steps in the multi-classification notebook to train the model.
Why the prediction layers changed from 2 into 176 ?
Hi @swka1043338 ,
the neuron number of the predictions layer is different with your class number because this eval process didnβt find the right wights. so please do this steps:
<job_id>
folder, such as /mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-<pvc_id>/users/<usr_id>/models/<model_id>/<job_id>/
ln -s best_model/weights/
<job_id>
folder should be like:total 56
drwxrwxrwx 6 www-data www-data 4096 Apr 11 06:50 ./
drwxrwxrwx 10 www-data www-data 4096 Apr 11 06:51 ../
-rw-rw-rw- 1 nobody nogroup 238 Apr 11 06:07 automl_metadata.json
drwxrwxrwx 4 nobody nogroup 4096 Apr 11 06:07 best_model/
-rw-rw-rw- 1 nobody nogroup 639 Apr 11 06:06 brain.json
-rw-rw-rw- 1 nobody nogroup 1504 Apr 11 06:07 controller.json
-rw-rw-rw- 1 nobody nogroup 296 Apr 11 06:07 controller.log
-rw-rw-rw- 1 nobody nogroup 1 Apr 11 06:06 current_rec.json
drwxrwxrwx 4 nobody nogroup 4096 Apr 11 06:05 experiment_0/
drwxrwxrwx 4 nobody nogroup 4096 Apr 11 06:06 experiment_1/
drwxrwxrwx 4 nobody nogroup 4096 Apr 11 06:07 experiment_2/
-rw-rw-rw- 1 nobody nogroup 1465 Apr 11 06:04 recommendation_0.kitti
-rw-rw-rw- 1 nobody nogroup 1464 Apr 11 06:05 recommendation_1.kitti
-rw-rw-rw- 1 nobody nogroup 1465 Apr 11 06:06 recommendation_2.kitti
lrwxrwxrwx 1 local-bizhao local-bizhao 19 Apr 11 06:50 weights -> best_model/weights//
/mnt/nfs_share/default-tao-toolkit-api-pvc-pvc-<pvc_id>/users/<usr_id>/models/<model_id>/specs/<eval_job_id>.yaml
, the config model_path
is your training weights rather than the default PTM.