I have tested that the ssd disk is ok. Everything is nomal when tlt is not training the network, but at the time training starting in the jupyter notebook for a while, something strange happen. I can’t immediately open the folder or other apps. Slowly everything is going to be stucked.
I have tried many times, but the problem is still around.
Is there a bug or some way to solve this? Thanks
What is the details your host PC? For examples, cpu info, gpu info, memory info, etc.
the pc info I7 6700, 16G , TITAN XP , SSD 512G
batch_size 4
root section is 100G
home section is 300+G
Seems that PC meets the requirement.
Actually this is the first time from end user to report this error.
I suggest you to try to narrow down.
Is it reproduced in all the notebooks, for example, yolo?
More, please check the software requirement.
See https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#requirements
Software Requirements
Ubuntu 18.04 LTS NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/ docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/ nvidia-docker2 installed, instructions: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0) NVIDIA GPU driver v410.xx or above
Ok.Later I will check the requirement and test if the problem could be reproduced in other networks . Thanks @Morganh
Sorry, recently too busy to reply, I trained in the FileFox before. In FileFox, even if the batch_size was 1, the training process would be stucked with bad expierence, but after all, the model would be saved. Finally I switch to the Chrome. It’s very pleased to see the smoothy process. Thanks very much.
Thanks for the info. Glad to see it works.