Tao train finish but nothing generated

Please provide the following information when requesting support.

• Hardware (RTX3080)
• Network Type (Detectnet_v2)
• TLT Version (tao toolkit 4.0)
• Training spec file(
6cfb5018-6d3f-444d-aa15-a5e3d96eb23c.json (240 Bytes)
)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I execute the detectnet v2 train according to the content of cv_samples_v1.4.1tao api, but the http status code is 201 but no model files are generated in the folder

The result is as shown below






what i may be doing wrong?

The training jobs is showing pending. Can you double check if it is running?

What items can I confirm to make sure the tap api is working properly?

Can you run this cell to check the job status?

More, from the log, you set to 300 epochs. Please try to set lower epochs to narrow down. For example, 3 epochs. Then run the above-mentioned cell to check the “status”.
And also check the result via
image

I try reset k8s and tao-api-toolkit-api container.
But the cell still responds to http status 201
train status ‘action’: ‘train’, ‘status’: ‘Pending’, ‘result’: {}}



What else do I need to check for the container to work properly?
or More detailed teaching instructions

Can you follow https://developer.nvidia.com/blog/training-like-an-ai-pro-using-tao-automl/ to check the environment again?

How to set up ansible hosts
Screenshot from 2023-03-01 17-35-31
Have a sample?


I follow https://developer.nvidia.com/blog/training-like-an-ai-pro-using-tao-automl/ yolov4.ipynb
The result is that the training status is still pending
The attached file is the file generated after executing the model api. Can you help me clarify the problem? I can’t find a way to solve it
jobs.yaml (422 Bytes)

metadata.json (1.1 KB)
444603fa-0607-4c61-a824-e5e68a705e18.json (240 Bytes)
train.json (3.3 KB)
After a period of time, use postman to execute the API to return the result

Can you share

  • the hosts file
  • The log when you run
    $ bash setup.sh check-inventory.yml
    $ bash setup.sh install


hosts (588 Bytes)

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

You are running on only 1 machine.
And you can set the passwd as below instead of .pem file.

[master]
172.23.70.118 ansible_ssh_user='aetina' ansible_ssh_pass=<yourpasswd>

Then re-run
$ bash setup.sh check-inventory.yml
$ bash setup.sh install

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.