Hi, I’m using AutoML which upgraded latest on tao toolkit v4.0.
The problem is, when I setup the TAO Toolkit API service on a bare-metal server, I got this issue
Seems that it is common failure info from the websites searching result.
May I know more detailed when you set up the server? Please refer to Setup — TAO Toolkit 4.0 documentation
Could you check if the blog https://developer.nvidia.com/blog/training-like-an-ai-pro-using-tao-automl/ can help you?
Please note that the “one-click deploy tar” file is actually as below. It is already available after you run "ngc registry resource download-version “nvidia/tao/tao-getting-started:4.0.0"” .
$ ls tao-toolkit-api-bare-metal
check-inventory.yml cnc hosts install-tao-toolkit-api.yml requirements.txt setup.sh tao-toolkit-api-ansible-values.yml tao-toolkit-api-helm-values.yml uninstall-nvidia-drivers.yml
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
Hi,
Below error comes from “check sufficient memory” task in “check-inventory.yml” .
Could you set “no_log” to false in order to get the full log?
More, this task is to check if “ansible_memtotal_mb >= 32000” (See the setting in check-inventory.yml) .
So, the error log may due to
The ansible_memtotal_mb is not accessible.
Or there is no 32000 Mb memory.
So the solution can be:
If the ansible_memtotal_mb is not accessible, then, please find ansible.cfg (should be in ~/ or in /etc/ansible/ansible.cfg). Check if the param gather_subset contains !hardware. If there is, then delete it. Refer to Ansible memtotal_mb fact is undefined - Stack Overflow
If the there is no 32000 Mb mem, then decrease the number of mem in the check-inventory.yml