Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Classification TF2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) v5.0.0
• Training spec file(If have, please share here) classification_tf2/tao_byom/specs/spec.yaml
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
How do I train my custom Keras model using NVIDIA TAO 5.0?
BYOM model converter only converts models from torchvision and timm, whereas I have defined my own custom model. I couldn’t see a way to convert a .hdf5 file directly to .tltb using the converter.
I would prefer not to go via ONNX because it’s unnecessary, but if there is a way to convert .hdf5 file in such a way that it can be accepted by TAO, let me know
This is the pipeline I was hoping to follow:
Write my own model.py using Keras and then compile and save it as model.hdf5 [None of this is in TAO] I have attached the model here. my_model.hdf5 (15.2 MB)
Using this model.hdf5 and encode.eff.py, I will convert the model into model.tltb or model.tlt, [not sure what the difference is]
I will then follow the BYOM notebook and hopefully my model is correctly loaded and trained.
But that’s not happening, there’s an error on running the training step. Here’s the error
In TAO 5.0.0, BYOM with TF1 (Classification and UNet) has been deprecated because the source code of TAO Toolkit is now fully open-sourced. To use BYOM with TF1, you will need to continue using TAO 4.0.
Classification TF2 still supports BYOM with the same workflow as TAO 4.0. If you wish to bring your own model weights in TAO 5.0.0, you can directly modify the source code to load the weights.
BYOM is a Python-based package that converts any open-source ONNX model to a TAO-compatible model.
To convert .hdf5 file to .tltb is not supported.
Suggest you to directly modify the source code to load your down model weights.
Just so I understand correctly, that would mean a lot of changes right? Add a backbone, change the list of accepted model architectures, and then add a function to load the model.
Is there a plan to add a low code version to do this Keras to .tltb conversion or do I have to rely on changing the source code throughout?
How do I run training and evaluation if I pull the git repository locally and make changes to the source code? I instantiate the docker and then run the Jupyter notebook inside the docker?
OR
Can I just launch the notebooks from my conda env without the docker? My issue is, how do I make sure tao train uses the local version of files available in tao_tensorflow2_backend
You can login the docker via
$ docker run --runtime=nvidia -it --rm docker_name /bin/bash
Then find the original file. For example, if going to modify one train.py
$ find /usr |grep train.py
Backup it, and then copy the modified version of train.py to replace it.
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks