Hey all,
Im trying to train yolo_tiny_v4 on a costume data using the proposed jupyter-notebook.
at the ‘# If you use your own dataset, you will need to run the code below to generate the best anchor shape’ section im running the command as follow:
!tao yolo_v4_tiny kmeans -l tao-experiments/ir_training/labels
-i tao-experiments/ir_training/images
-n 9
-x 640
-y 512
where the image size in 640 X 512 and i receiving to follow error.
2022-01-05 13:10:43,632 [INFO] root: Registry: ['nvcr.io']
2022-01-05 13:10:43,710 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-05 13:10:43,725 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py", line 14, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py", line 201, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py", line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-05 13:10:49,163 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
what could be the problem that cause to this problem?
not sure what did you meant by “bboxes may be less than 9 in the labels” since the labels directory contains txt files of the boundary boxes for each image
ok thanks!
on other topic, i download form the ngc repository “nvidia/tao/pretrained_object_detection:cspdarknet_tiny”
a pre-train model for yolov4-tiny using the command
!ngc registry model download-version nvidia/tao/pretrained_object_detection:cspdarknet_tiny
–dest $LOCAL_EXPERIMENT_DIR/pretrained_cspdarknet_tiny
since the nvstaging/tao/pretrained_object_detection repository is not responding (403)
and when i tried to train the model using
!tao yolo_v4_tiny train -e $SPECS_DIR/yolo_v4_tiny_train_chimera_seq.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
–gpus 1
i received the follow error :
iles/ai_infra/iva/yolo_v4/models/yolov4_model.py", line 595, in build_savers
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/ubuntu/dev/tao_experiment/yolo_v4_tiny/experiment_dir_unpruned/weights’
from where i should download the weights in order train the model?
I looked at the k-means API to see if there is a normalized bb option and didn’t find it.
tao yolo_v4 kmeans [-h] -l <label_folders>
-i <image_folders>
-x <network base input width>
-y <network base input height>
[-n <num_clusters>]
[--max_steps <kmeans max steps>]
[--min_x <ignore boxes with width less than this value>]
[--min_y <ignore boxes with height less than this value>]
can I use the normalized bbox coordinates or should I convert them back to actual value?
Hey Morganh
thank you for the advice, but unfortunately it didnt solve the issue i have.
I change the labels file to a KITTI format (my bbox are normalised to 0…1 coordinate values) and they look like this:
2022-01-06 14:00:38,926 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 14:00:39,008 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:00:39,023 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py”, line 14, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 201, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-06 14:00:44,341 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
2022-01-06 14:14:59,154 [INFO] root: Registry: ['nvcr.io']
2022-01-06 14:14:59,236 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:14:59,252 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2022-01-06 14:14:59,790 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
90
OK, after checking, the bbox (x1,y1,x2,y2) value should be the actual value. Could you select several label files and modify its x1,y1,x2,y2 and try again?
2022-01-06 14:44:13,329 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 14:44:13,418 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:44:13,433 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py”, line 14, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 201, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-06 14:44:18,593 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
my bad, I’ve been confused with the original yolo annotation format [xmin, ymin, widht, height] structure , i fix it and the kmeans optimisation works just fine
2022-01-06 15:27:58,728 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 15:27:58,812 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 15:27:58,827 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Start optimization iteration: 1
Please use following anchor sizes in YOLO config:
(8.00, 30.67)
(10.40, 45.33)
(13.60, 48.00)
(20.00, 33.33)
(26.40, 36.00)
(19.20, 58.67)
(24.00, 90.67)
(40.80, 54.67)
(44.80, 117.33)
2022-01-06 15:28:04,057 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
I should replace the anchor values both in the train and retrain seq files?
thanks so much for your help.
Something I have encounter with is
!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned
!tao yolo_v4_tiny run ls $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned
2022-01-06 16:07:36,888 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 16:07:36,968 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 16:07:36,983 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal. ls: cannot access '/home/ubuntu/dev/tao_experiment/yolo_v4_tiny/experiment_dir_unpruned': No such file or directory
Thanks @Morganh
Finally i was able to run the training phase , but i have few questions:
I didnt find in the train and retrain spec files configurations regard the image width and height ( which is 640X480 ) it only exist at the k-means anchor calculations. I assume that is why the first model output is No training configuration found . Are these values necessary for the training? and how i should init them?
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
Input (InputLayer) (None, 3, None, None 0
__________________________________________________________________________________________________
Input_qdq (QDQ) (None, 3, None, None 1 Input[0][0]
__________________________________________________________________________________________________
conv_0 (QuantizedConv2D) (None, 32, None, Non 864 Input_qdq[0][0]