Using wandb (weights and biases) with TAO API the same way we use ClearML

TAO 4.0.0

Hi I’m using TAO with a local k8 cluster, for telemetry I use clearml (super easy to setup becase we can include all the data in the values.yaml of the helm chart) like this

# Optional MLOPS setting for ClearML
clearMlApiAccessKey:  <ACCESS-KEY>
clearMlApiSecretKey: <API-KEY>

and by configuring the spec

# get default specification schema for training.
endpoint = f"{base_url}/model/{model_ID}/specs/train/schema"
response = requests.get(endpoint, headers=headers, verify=rootca)
specs = response.json()["default"]
specs["training_config"]["visualizer"]["clearml_config"] = {}
specs["training_config"]["visualizer"]["clearml_config"]["project"] = "my_project"
specs["training_config"]["visualizer"]["clearml_config"]["tags"] = ["training", "tao_toolkit"]
specs["training_config"]["visualizer"]["clearml_config"]["task"] = "training_experiment_1"

This works!

But I’d like to try wandb in a similar manner

I have a wandb server deployed in the local k8 cluster and to sttream telemetry I have to manually stream log entries by repeatedly runing REST api calls to retrive data from TAO and calling

wandb.log({"key1":value1, "key2", value2 ... })

to send data to wandb local server.

Is there a way to run

wandb.login(key="local-API-KEY", host="http://my.local.domin:my_local_domian_port_number", relogin=True)

command within the client that is in the k8 pod? so all I have to do is inlcude

specs["training_config"]["num_epochs"] = num_epochs
specs["training_config"]["visualizer"]["enabled"] = True

# add the wandb_config section
specs["training_config"]["visualizer"]["wandb_config"] = {}
specs["training_config"]["visualizer"]["wandb_config"]["project"] = "my_net"
specs["training_config"]["visualizer"]["wandb_config"]["tags"] = ["training", "tao_toolkit"]
specs["training_config"]["visualizer"]["wandb_config"]["notes"] = "training_experiment_1"


Sorry for late. I will check internally further.

Cheers,!! Tbh it is not a big problem (only a nice to have atm) . I am happy with clear-ml

On a more critical front.
I actually found out that the API goes unresponsive when training not too small datasets (around 25GB) I have a suspicion because it is I’m using local volume mounts for pvs rather then storage provisioners and network storage (we got some on the way) if that goes bad I will let you know! (Apologies for the out of context-ness, will delete the question and ask again once I confirm, as I don’t want to use this forum for k8 stuff) I was expecting to plug in our large datasets for automl with the two api kit and I need to get near 25 GB at a minimum so that concerns takes over this issue.