TAO 5.3, YOLOv4 : Retrieving kmeans 'anchor shapes' through API call

Hi,

  • I’m using TAO API 5.3 on AWS EKS, T4 GPU hardware.
  • I’m training YOLOv4 on a custom kitti image dataset.
  • I’ve successfully run the dataset_convert action on my train and val sets.
  • To obtain YOLOv4 anchors, I’ve modified the kmeans default spec to include size_x and size_y, and successfully run the kmeans action over my dataset. The job runs successfully, but the output does not contain anchor shapes that I need for yolov4_config in the training spec file.

Question: Via the API, how do I retrieve anchor shapes that result from running kmeans action?

Specifically referring to this step. Output is expected to be 3 tuples, something like this:

Code:

## Default specs; from {base_url}/datasets/{dataset_id}/specs/kmeans/schema"
specs = response.json()["default"]
specs["size_x"] = 512
specs["size_y"] = 512

## Kick off the kmeans job
parent = parent_id
action = "kmeans"
data = json.dumps({"parent_job_id": parent, "action": action, "specs": specs})
endpoint = f"{base_url}/datasets/{dataset_id}/jobs"
response = requests.post(endpoint, data=data, headers=headers, verify=False)

print(response)

Output (notice how job status=SUCCESS, but anchor shapes are not present):

{'action': 'kmeans', 
 'created_on': '2024-10-23T01:45:46.834550', 
 'dataset_id': 'ccfaa892-c6bc-4d4d-a87e-9f3cc7fe1195', 
 'description': '', 
  'id': '5adfacfe-8432-442f-90e0-d40feb3045c6',
  'job_tar_stats': 
    {'file_size': 625, 
      'sha256_digest': 'af7be32bec080af4fa4b04ba9429ce5a5215749b41cf090047164c6a556672ce'
    }, 
    'last_modified': '2024-10-23T01:47:22.136608', 
     'name': '', 
     'parent_id': 'ec204a96-5f08-4fdf-8b32-4cb828d60cc6', 'result': {
     'categorical': [], 
     'cur_iter': None, 
     'detailed_status': {
       'date': '10/23/2024', 
       'message': 'K-means finished successfully.', 
       'status': 'SUCCESS', 
       'time': '1:47:8'
     }, 
   'epoch': None, 
   'eta': None, 
   'graphical': [], 
   'key_metric': 0.0, 
   'kpi': [], 
   'max_epoch': None, 
   'time_per_epoch': None, 
   'time_per_iter': None
  }, 
  'specs': {
    'max_steps': 10000, 
    'min_x': 0, 
    'min_y': 0, 
    'num_clusters': 9, 
    'size_x': 512, 
    'size_y': 512
  }, 
  'status': 'Done'
}

Please try to check the logs of some pods and find some info.
For example,
$ kubectl get pods
$ kubectl log -f tao-toolkit-api-workflow-xxx
$ kubectl log -f tao-toolkit-api-app-pod-xxx

Thank you, I checked logs while running the kmeans action and found a couple of useful pieces of info in the “workflow” pod logs.

Attaching the log file.

tao-api-workflow-pod-759754cdfc-dvk8h.txt (2.1 KB)

Now I have some follow-up questions based on the logs:

  • The kmeans call appears to write to a results_dir :
    • --results_dir=/shared/users/<user_ID>datasets/<dataset_ID>/<kmeans_job_ID/
  • I checked that directory and see 2 files:
>>> ls /shared/users/94eaf12c-1c71-5507-ae4c-873e9f488a98/datasets/ccfaa892-c6bc-4d4d-a87e-9f3cc7fe1195/aa730fe1-d26f-4439-b6f7-6ca29a9e5164/
>>> logs_from_toolkit.txt  status.json

These files contain the anchors info I need, so my question is – how can I access these data via API?

Please try to leverage the notebook’s cells to override/set the new anchor shapes to modify the spec file.

Right – I’m able to do that just fine, and POST the modified spec file with new anchors to the TAO server when starting the train action.

I was just wondering if there’s a way to retrieve the anchor shapes from the finished kmeans job via API, instead of inspecting the results_dir files and editing the fields manuallyy.

Reason for doing it this way is that our frontend client wouldn’t have kubectl access nor would be able to see raw files on the server where TAO is deployed, so we need a way to retrieve the anchor shapes via an API call.

Looks like a new feature request for YOLO in TAO-API. May I know which notebook you are running? I will sync with internal team.

Thank you! We’re using the object detection workflow notebook but YOLOv4 instead of detectnet_v2.

That would be a nice feature to have for YOLOv3 / YOLOv4 / YOLOv4-tiny.

Given that the point of kmeans in this context is to compute the anchor shapes, I was a bit surprised that querying a kmeans job endpoint gives a bunch of information, but not the actual anchors themselves.

Thanks again for checking on this.

@Morganh one more follow-up question … since TAO is open-source, could we extend the functionality by altering the source code to define custom endpoints + actions?

In this case, I suppose we’d implement our own kmeans function that computes and returns anchors as JSON. But we’d need a way to define a custom kmeans_custom endpoint within TAO that triggers this action from client-side, and returns the anchors as a response.

Is there a recommended way to do something like this?

Yes, you can. The TAO source code is in the bottom of NVIDIA Corporation · GitHub. The YOLO code is in the tao_tensorflow1_backend/nvidia_tao_tf1/cv at main · NVIDIA/tao_tensorflow1_backend · GitHub.

The 5.0-tf1 docker is from TAO Toolkit | NVIDIA NGCnvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5. You can login it and modify the corresponding code and then docker commit the new container.
Then take the 5.0 helm chart and update your new 5.0-tf1 image name.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.