Please provide the following information when requesting support.
• Network Type
Detectnet_v2/Yolo_v3
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
5.0.0-tf2.11.0
I am using TAO toolkit for transfer learning, and W&B to track training process. I tested on the latest version of the jupyter notebook, but have two questions here.
For both the Yolo and RetinaNet, there are panels showing ‘histogram/beta’ and ‘histogram/gamma’ for different layers. Can I confirm the meaning of these metrics?
In the training_config file under visualizer, there are a few options to configure the wandb run, including project, name, and tags, etc. In addition to those options, is it possible to configure the metrics one would like to track or ignore in the experiment? For example the images with bounding boxes are shown under Media panel with RetinaNet in W&B, but not with Yolov3. Is this something we can configure and customize?
Thanks for your reply. For the second question, I guess I should have asked in a better way. I was not trying to disable wandb or get rid of it, but I am trying to configure the metrics that are shown in wandb. I am wondering how I can check and modify the logging variables in wandb.log with TAO toolkit CLI for different models, so that I can specify the metrics sent to w&b server .
Thanks. I did refer to the documentation listed here. It seems that the only configurable elements are project, entity, etc. that associate the name and tags to the experiments.
Before the training started for a particular model, if we need to specify the variables/metrics sent to the W&B server, my understanding is that we need to do some reverse engineering into this tao_tensorflow1_backendrepository. Is that correct?
A simple way is to login the docker and then modify the .py files you want to change. And then docker commit to save your changes.
Steps:
$ docker run --runtime=nvidia -it --rm <tao_5.0_docker>
In docker, find the original file. For example, if going to modify one train.py
$ find /usr |grep train.py
Backup it, and then copy the modified version of train.py to replace it.
Open another terminal, you can run docker commit to generate a new “modified” version of docker.
One follow-up question is:
If we run the TAO toolkit by invoking the containers directly, yes we can specify the specific docker to be pulled, and make changes as you suggested. If we use CLI for training, since there is no need to specify the exact docker container, is there a way to modify the code? Thanks.
You can still modify the code inside the docker.
For example,
$ tao model ssd run /bin/bash
Then modify the code inside the docker. Usually the code locates in /usr
I tried running tao model ssd run /bin/bash and yes I was able to modify the docker, and save the modified container. If now we would like to use tao model ssd train for training, how to specify it so that the modified container will be used? It seems that with CLI there is no way to specify the Docker container to be pulled as using the Docker container directly.
Got it clear now. We can use both the CLI or docker run to get into the docker container, but to implement training or other tasks with the modified docker, we can only use docker run so that we can specify the docker container to be pulled. Does that sound right?