Tao pre-trained yolo4tiny - AssertionError: Must have more boxes than clusters

user14171 · January 5, 2022, 1:20pm

Hey all,
Im trying to train yolo_tiny_v4 on a costume data using the proposed jupyter-notebook.
at the ‘# If you use your own dataset, you will need to run the code below to generate the best anchor shape’ section im running the command as follow:
!tao yolo_v4_tiny kmeans -l tao-experiments/ir_training/labels
-i tao-experiments/ir_training/images
-n 9
-x 640
-y 512
where the image size in 640 X 512 and i receiving to follow error.

2022-01-05 13:10:43,632 [INFO] root: Registry: ['nvcr.io']
2022-01-05 13:10:43,710 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-05 13:10:43,725 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py", line 14, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py", line 201, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py", line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-05 13:10:49,163 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

what could be the problem that cause to this problem?

Morganh · January 5, 2022, 2:29pm

The valid bboxes may be less than 9 in tao-experiments/ir_training/labels. Please check.

user14171 · January 5, 2022, 2:48pm

not sure what did you meant by “bboxes may be less than 9 in the labels” since the labels directory contains txt files of the boundary boxes for each image

!cat $LOCAL_DATA_DIR/ir_training/labels/11_10_22_887.txt

> car 0.7390625 0.07604166666666666 0.04375 0.05625
> car 0.76484375 0.20520833333333333 0.0453125 0.08125
> car 0.740625 0.13541666666666666 0.03125 0.05
> car 0.85546875 0.3 0.0484375 0.03333333333333333
> car 0.85234375 0.45729166666666665 0.0546875 0.04375
> car 0.70546875 0.6239583333333333 0.0421875 0.04375
> car 0.8359375 0.734375 0.059375 0.04791666666666667

could you please explain what actually im missing here?

Morganh · January 5, 2022, 2:51pm

The label format is not expected. See Data Annotation Format — TAO Toolkit 3.22.05 documentation

Examples are

car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00
pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00

user14171 · January 5, 2022, 3:02pm

ok thanks!
on other topic, i download form the ngc repository “nvidia/tao/pretrained_object_detection:cspdarknet_tiny”
a pre-train model for yolov4-tiny using the command

!ngc registry model download-version nvidia/tao/pretrained_object_detection:cspdarknet_tiny
–dest $LOCAL_EXPERIMENT_DIR/pretrained_cspdarknet_tiny

since the nvstaging/tao/pretrained_object_detection repository is not responding (403)
and when i tried to train the model using

!tao yolo_v4_tiny train -e $SPECS_DIR/yolo_v4_tiny_train_chimera_seq.txt
-r $USER_EXPERIMENT_DIR/experiment_dir_unpruned
-k $KEY
–gpus 1
i received the follow error :

iles/ai_infra/iva/yolo_v4/models/yolov4_model.py", line 595, in build_savers
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/ubuntu/dev/tao_experiment/yolo_v4_tiny/experiment_dir_unpruned/weights’

from where i should download the weights in order train the model?

Morganh · January 5, 2022, 3:17pm

Just need to mkdir $USER_EXPERIMENT_DIR/experiment_dir_unpruned

user14171 · January 5, 2022, 4:30pm

I looked at the k-means API to see if there is a normalized bb option and didn’t find it.

tao yolo_v4 kmeans [-h] -l <label_folders>
                        -i <image_folders>
                        -x <network base input width>
                        -y <network base input height>
                        [-n <num_clusters>]
                        [--max_steps <kmeans max steps>]
                        [--min_x <ignore boxes with width less than this value>]
                        [--min_y <ignore boxes with height less than this value>]

can I use the normalized bbox coordinates or should I convert them back to actual value?

Morganh · January 6, 2022, 1:43am

The API calculates the value of label files.

user14171 · January 6, 2022, 2:05pm

Hey Morganh
thank you for the advice, but unfortunately it didnt solve the issue i have.
I change the labels file to a KITTI format (my bbox are normalised to 0…1 coordinate values) and they look like this:

car 0.00 0 0.00 0.58125 0.3302083333333333 0.125 0.16041666666666668 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 0.690625 0.5041666666666667 0.13125 0.15833333333333333 0.00 0.00 0.00 0.00 0.00 0.00 0.00

and when i execute the kmeans for calculate the best anchor shape

!tao yolo_v4_tiny kmeans -l tao-experiments/data/chimera_ir_training/labels
-i tao-experiments/data/chimera_ir_training/images
-n 6
-x 640
-y 512

I receive again the exception:

2022-01-06 14:00:38,926 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 14:00:39,008 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:00:39,023 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py”, line 14, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 201, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-06 14:00:44,341 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

what could be the problem?
best regards!

Morganh · January 6, 2022, 2:11pm

Can you run below and share the log?
! tao yolo_v4_tiny run ls tao-experiments/data/chimera_ir_training/labels |wc -l

user14171 · January 6, 2022, 2:15pm

2022-01-06 14:14:59,154 [INFO] root: Registry: ['nvcr.io']
2022-01-06 14:14:59,236 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:14:59,252 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2022-01-06 14:14:59,790 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
90

Morganh · January 6, 2022, 2:20pm

OK, after checking, the bbox (x1,y1,x2,y2) value should be the actual value. Could you select several label files and modify its x1,y1,x2,y2 and try again?

user14171 · January 6, 2022, 2:48pm

thank for the replay, i convert the coordinates to their actual values

!cat $LOCAL_DATA_DIR/chimera_ir_training/labels/11_10_22_887.txt
car 0.00 0 0.00 378.4 48.666666666666664 22.4 36.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 391.6 131.33333333333331 23.2 52.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 379.2 86.66666666666666 16.0 32.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 438.0 192.0 24.8 21.333333333333332 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 436.4 292.66666666666663 28.0 28.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 361.2 399.3333333333333 21.6 28.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 428.0 470.0 30.4 30.666666666666668 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 390.0 520.0 16.8 48.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 414.4 534.6666666666667 22.4 26.666666666666664 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 417.2 592.6666666666666 29.6 30.666666666666668 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 350.4 600.0 41.6 26.666666666666664 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 348.8 571.3333333333334 41.6 20.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00

but still receiving the same error:

2022-01-06 14:44:13,329 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 14:44:13,418 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 14:44:13,433 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/kmeans.py”, line 14, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 201, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/scripts/kmeans.py”, line 169, in kmeans
AssertionError: Must have more boxes than clusters
2022-01-06 14:44:18,593 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · January 6, 2022, 3:06pm

Why x1<x2, y1<y2 ?

user14171 · January 6, 2022, 3:35pm

my bad, I’ve been confused with the original yolo annotation format [xmin, ymin, widht, height] structure , i fix it and the kmeans optimisation works just fine

!cat $LOCAL_DATA_DIR/chimera_ir_training/labels/1_10_22_887.txt
car 0.00 0 0.00 367 30 389 66 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 380 105 403 157 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 371 70 387 102 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 425 181 450 202 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 422 278 450 306 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 350 385 372 413 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 412 454 443 485 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 381 496 398 544 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 403 521 425 548 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 402 577 432 608 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 329 586 371 613 0.00 0.00 0.00 0.00 0.00 0.00 0.00
car 0.00 0 0.00 328 561 369 581 0.00 0.00 0.00 0.00 0.00 0.00 0.00

2022-01-06 15:27:58,728 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 15:27:58,812 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 15:27:58,827 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
Using TensorFlow backend.
Start optimization iteration: 1
Please use following anchor sizes in YOLO config:
(8.00, 30.67)
(10.40, 45.33)
(13.60, 48.00)
(20.00, 33.33)
(26.40, 36.00)
(19.20, 58.67)
(24.00, 90.67)
(40.80, 54.67)
(44.80, 117.33)
2022-01-06 15:28:04,057 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

I should replace the anchor values both in the train and retrain seq files?

Morganh · January 6, 2022, 3:38pm

Yes, modify the shapes in the head of spec files.

user14171 · January 6, 2022, 4:29pm

thanks so much for your help.
Something I have encounter with is

!mkdir -p $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned
!tao yolo_v4_tiny run ls $LOCAL_EXPERIMENT_DIR/experiment_dir_unpruned

2022-01-06 16:07:36,888 [INFO] root: Registry: [‘nvcr.io’]
2022-01-06 16:07:36,968 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-06 16:07:36,983 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/ubuntu/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
ls: cannot access '/home/ubuntu/dev/tao_experiment/yolo_v4_tiny/experiment_dir_unpruned': No such file or directory

what could be the problem the causing it?

Morganh · January 6, 2022, 4:33pm

All the path after “tao yolo_v4_tiny” should be the path inside the docker. You can check your tao_mounts.json.

More info, see TAO Toolkit Launcher — TAO Toolkit 3.22.05 documentation

user14171 · January 12, 2022, 2:25pm

Thanks @Morganh
Finally i was able to run the training phase , but i have few questions:

I didnt find in the train and retrain spec files configurations regard the image width and height ( which is 640X480 ) it only exist at the k-means anchor calculations. I assume that is why the first model output is No training configuration found . Are these values necessary for the training? and how i should init them?

/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
  warnings.warn('No training configuration found in save file: '
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Input (InputLayer)              (None, 3, None, None 0                                            
__________________________________________________________________________________________________
Input_qdq (QDQ)                 (None, 3, None, None 1           Input[0][0]                      
__________________________________________________________________________________________________
conv_0 (QuantizedConv2D)        (None, 32, None, Non 864         Input_qdq[0][0]

Morganh · January 12, 2022, 2:27pm

The log can be ignored.

Could you please elaborate more?

Topic		Replies	Views
Enviromental variables and docker mount error for transfer laerning using yolov4 TAO Toolkit	8	814	October 14, 2021
Error in Generating TFrecords for yolov4 TAO Toolkit	38	1227	May 17, 2022
Error in TAO-Toolkit while training TAO Toolkit	15	1505	July 6, 2022
Tao toolkit version5 is getting error when comes to training part TAO Toolkit	45	1708	August 22, 2023
Deploying Custom Trained Yolov4 model on Deepstream 6.2 sdk DeepStream SDK	21	947	March 17, 2023
Train with my own tlt model #2 TAO Toolkit	42	2777	February 8, 2022
Classification_pyt error TAO Toolkit jetson	16	85	September 18, 2024
Inference YOLO_v4 int8 mode doesn't show any bounding box TAO Toolkit	31	2543	November 12, 2021
Unable to train yolov4 with Tao succesfully TAO Toolkit	6	505	April 28, 2023
TAO 4.0 AutoML Detectnet_V2 KeyError on training step TAO Toolkit	19	674	July 15, 2023

Tao pre-trained yolo4tiny - AssertionError: Must have more boxes than clusters

Related topics