Hello AI World Training Cat/Dog

I have problem here in where when i wanted to train the model, it crashes and stuck. tried with workers=1 and batch-size=4 with epochs=1, it still crashes and reboot the Jetson.

The next time i tried again, they said got errors where torch couldn’t be found in the module.

Thank you!

Hi,

Do you have PyTorch installed?

You can find the instructions below:

Thanks.

hello yes i do have PyTorch installed but still gives the same output


may i know what is the problem?


I tried to reinstall, yet they cannot launch. May i know why?

Hi @n.syafiqahme , due to the error about not having setuptools, you can try apt-get install python3-setuptools

Also, if you have much problem with installing PyTorch, I recommend trying the jetson-inference docker container, which already has PyTorch/ect pre-installed in it:

1 Like

It seems like the cat_dog training is not working and it suddenly reboots. May I know why is that happening even though i did swap files and my memories has 80GB left?

Thanks

Can you keep an eye on the memory usage in another terminal window by running tegrastats? My guess is that it is running low on memory. Training takes a lot of memory and is a stretch to get working in 4GB memory, so close those chrome tabs and everything. Ideally you would disable the Jetson’s desktop entirely for this step to save additional memory and processor utilization, and SSH into it from a PC.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.