Tao pre-trained yolo4tiny - AssertionError: Must have more boxes than clusters

Hey all,
Im trying to train yolo_tiny_v4 on a costume data using the proposed jupyter-notebook.
at the ‘# If you use your own dataset, you will need to run the code below to generate the best anchor shape’ section im running the command as follow:
!tao yolo_v4_tiny kmeans -l tao-experiments/ir_training/labels
-i tao-experiments/ir_training/images
-n 9
-x 640
-y 512
where the image size in 640 X 512 and i receiving to follow error.

2022-01-05 13:10:43,632 [INFO] root: Registry: ['nvcr.io']
2022-01-05 13:10:43,710 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-05 13:10:43,725 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py", line 14, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py", line 201, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py", line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-05 13:10:49,163 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

what could be the problem that cause to this problem?

The valid bboxes may be less than 9 in tao-experiments/ir_training/labels. Please check.

not sure what did you meant by “bboxes may be less than 9 in the labels” since the labels directory contains txt files of the boundary boxes for each image

!cat $LOCAL_DATA_DIR/ir_training/labels/11_10_22_887.txt

> car 0.7390625 0.07604166666666666 0.04375 0.05625
> car 0.76484375 0.20520833333333333 0.0453125 0.08125
> car 0.740625 0.13541666666666666 0.03125 0.05
> car 0.85546875 0.3 0.0484375 0.03333333333333333
> car 0.85234375 0.45729166666666665 0.0546875 0.04375
> car 0.70546875 0.6239583333333333 0.0421875 0.04375
> car 0.8359375 0.734375 0.059375 0.04791666666666667

could you please explain what actually im missing here?

The label format is not expected. See Data Annotation Format — TAO Toolkit 3.22.05 documentation

Examples are

car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00
pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00
1 Like

ok thanks!
on other topic, i download form the ngc repository “nvidia/tao/pretrained_object_detection:cspdarknet_tiny”
a pre-train model for yolov4-tiny using the command

!ngc registry model download-version nvidia/tao/pretrained_object_detection:cspdarknet_tiny
–dest $LOCAL_EXPERIMENT_DIR/pretrained_cspdarknet_tiny

since the nvstaging/tao/pretrained_object_detection repository is not responding (403)
and when i tried to train the model using

!tao yolo_v4_tiny train -e $SPECS_DIR/yolo_v4_tiny_train_chimera_seq.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
–gpus 1
i received the follow error :

iles/ai_infra/iva/yolo_v4/models/yolov4_model.py", line 595, in build_savers
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/ubuntu/dev/tao_experiment/yolo_v4_tiny/experiment_dir_unpruned/weights’

from where i should download the weights in order train the model?

Just need to mkdir $USER_EXPERIMENT_DIR/experiment_dir_unpruned

1 Like

I looked at the k-means API to see if there is a normalized bb option and didn’t find it.

tao yolo_v4 kmeans [-h] -l <label_folders>
                        -i <image_folders>
                        -x <network base input width>
                        -y <network base input height>
                        [-n <num_clusters>]
                        [--max_steps <kmeans max steps>]
                        [--min_x <ignore boxes with width less than this value>]
                        [--min_y <ignore boxes with height less than this value>]

can I use the normalized bbox coordinates or should I convert them back to actual value?

The API calculates the value of label files.

Hey Morganh
thank you for the advice, but unfortunately it didnt solve the issue i have.
I change the labels file to a KITTI format (my bbox are normalised to 0…1 coordinate values) and they look like this:

car 0.00 0 0.00 0.58125 0.3302083333333333 0.125 0.16041666666666668 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 0.690625 0.5041666666666667 0.13125 0.15833333333333333 0.00 0.00 0.00 0.00 0.00 0.00 0.00

and when i execute the kmeans for calculate the best anchor shape

!tao yolo_v4_tiny kmeans -l tao-experiments/data/chimera_ir_training/labels
-i tao-experiments/data/chimera_ir_training/images
-n 6
-x 640
-y 512

I receive again the exception:

2022-01-06 14:00:38,926 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 14:00:39,008 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:00:39,023 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py”, line 14, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 201, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-06 14:00:44,341 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

what could be the problem?
best regards!

Can you run below and share the log?
! tao yolo_v4_tiny run ls tao-experiments/data/chimera_ir_training/labels |wc -l

2022-01-06 14:14:59,154 [INFO] root: Registry: ['nvcr.io']
2022-01-06 14:14:59,236 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:14:59,252 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2022-01-06 14:14:59,790 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
90

OK, after checking, the bbox (x1,y1,x2,y2) value should be the actual value. Could you select several label files and modify its x1,y1,x2,y2 and try again?

thank for the replay, i convert the coordinates to their actual values

!cat $LOCAL_DATA_DIR/chimera_ir_training/labels/11_10_22_887.txt
car 0.00 0 0.00 378.4 48.666666666666664 22.4 36.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 391.6 131.33333333333331 23.2 52.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 379.2 86.66666666666666 16.0 32.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 438.0 192.0 24.8 21.333333333333332 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 436.4 292.66666666666663 28.0 28.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 361.2 399.3333333333333 21.6 28.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 428.0 470.0 30.4 30.666666666666668 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 390.0 520.0 16.8 48.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 414.4 534.6666666666667 22.4 26.666666666666664 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 417.2 592.6666666666666 29.6 30.666666666666668 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 350.4 600.0 41.6 26.666666666666664 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 348.8 571.3333333333334 41.6 20.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00

but still receiving the same error:

2022-01-06 14:44:13,329 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 14:44:13,418 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:44:13,433 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py”, line 14, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 201, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-06 14:44:18,593 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Why x1<x2, y1<y2 ?

my bad, I’ve been confused with the original yolo annotation format [xmin, ymin, widht, height] structure , i fix it and the kmeans optimisation works just fine

!cat $LOCAL_DATA_DIR/chimera_ir_training/labels/1_10_22_887.txt
car 0.00 0 0.00 367 30 389 66 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 380 105 403 157 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 371 70 387 102 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 425 181 450 202 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 422 278 450 306 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 350 385 372 413 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 412 454 443 485 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 381 496 398 544 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 403 521 425 548 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 402 577 432 608 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 329 586 371 613 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 328 561 369 581 0.00 0.00 0.00 0.00 0.00 0.00 0.00

2022-01-06 15:27:58,728 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 15:27:58,812 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 15:27:58,827 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Start optimization iteration: 1
Please use following anchor sizes in YOLO config:
(8.00, 30.67)
(10.40, 45.33)
(13.60, 48.00)
(20.00, 33.33)
(26.40, 36.00)
(19.20, 58.67)
(24.00, 90.67)
(40.80, 54.67)
(44.80, 117.33)
2022-01-06 15:28:04,057 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I should replace the anchor values both in the train and retrain seq files?

Yes, modify the shapes in the head of spec files.

thanks so much for your help.
Something I have encounter with is

!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned
!tao yolo_v4_tiny run ls $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned

2022-01-06 16:07:36,888 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 16:07:36,968 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 16:07:36,983 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
ls: cannot access '/home/ubuntu/dev/tao_experiment/yolo_v4_tiny/experiment_dir_unpruned': No such file or directory

what could be the problem the causing it?

All the path after “tao yolo_v4_tiny” should be the path inside the docker. You can check your tao_mounts.json.

More info, see TAO Toolkit Launcher — TAO Toolkit 3.22.05 documentation

Thanks @Morganh
Finally i was able to run the training phase , but i have few questions:

  1. I didnt find in the train and retrain spec files configurations regard the image width and height ( which is 640X480 ) it only exist at the k-means anchor calculations. I assume that is why the first model output is No training configuration found . Are these values necessary for the training? and how i should init them?
/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input (InputLayer)              (None, 3, None, None 0                                            
__________________________________________________________________________________________________
Input_qdq (QDQ)                 (None, 3, None, None 1           Input[0][0]                      
__________________________________________________________________________________________________
conv_0 (QuantizedConv2D)        (None, 32, None, Non 864         Input_qdq[0][0]

The log can be ignored.

Could you please elaborate more?