Hello AI World Training Cat/Dog

n.syafiqahme · March 21, 2024, 9:48am

I have problem here in where when i wanted to train the model, it crashes and stuck. tried with workers=1 and batch-size=4 with epochs=1, it still crashes and reboot the Jetson.

The next time i tried again, they said got errors where torch couldn’t be found in the module.

Thank you!

AastaLLL · March 22, 2024, 4:00am

Hi,

Do you have PyTorch installed?

You can find the instructions below:

Thanks.

n.syafiqahme · March 25, 2024, 3:15am

hello yes i do have PyTorch installed but still gives the same output

may i know what is the problem?

n.syafiqahme · March 25, 2024, 7:45am

I tried to reinstall, yet they cannot launch. May i know why?

dusty_nv · March 26, 2024, 1:11pm

Hi @n.syafiqahme , due to the error about not having setuptools, you can try apt-get install python3-setuptools

Also, if you have much problem with installing PyTorch, I recommend trying the jetson-inference docker container, which already has PyTorch/ect pre-installed in it:

github.com

dusty-nv/jetson-inference/blob/master/docs/aux-docker.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="jetpack-setup-2.md">Back</a> | <a href="building-repo-2.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>System Setup</sup></p>  

# Running the Docker Container

Pre-built Docker container images for this project are hosted on [DockerHub](https://hub.docker.com/r/dustynv/jetson-inference/tags).  Alternatively, you can [Build the Project ](building-repo-2.md) from source.   

Below are the currently available container tags:

| Container Tag                                                                           | L4T version |          JetPack version         |
|-----------------------------------------------------------------------------------------|:-----------:|:--------------------------------:|
| [`dustynv/jetson-inference:r35.3.1`](https://hub.docker.com/r/dustynv/jetson-inference/tags) | L4T R35.3.1 | JetPack 5.1.1 |
| [`dustynv/jetson-inference:r35.2.1`](https://hub.docker.com/r/dustynv/jetson-inference/tags) | L4T R35.2.1 | JetPack 5.1 |
| [`dustynv/jetson-inference:r35.1.0`](https://hub.docker.com/r/dustynv/jetson-inference/tags) | L4T R35.1.0 | JetPack 5.0.2 |
| [`dustynv/jetson-inference:r34.1.1`](https://hub.docker.com/r/dustynv/jetson-inference/tags) | L4T R34.1.1 | JetPack 5.0.1 |
| [`dustynv/jetson-inference:r32.7.1`](https://hub.docker.com/r/dustynv/jetson-inference/tags) | L4T R32.7.1 | JetPack 4.6.1 |
| [`dustynv/jetson-inference:r32.6.1`](https://hub.docker.com/r/dustynv/jetson-inference/tags) | L4T R32.6.1 | JetPack 4.6 |
| [`dustynv/jetson-inference:r32.5.0`](https://hub.docker.com/r/dustynv/jetson-inference/tags) | L4T R32.5.0 | JetPack 4.5 |

This file has been truncated. show original

n.syafiqahme · March 28, 2024, 7:21pm

It seems like the cat_dog training is not working and it suddenly reboots. May I know why is that happening even though i did swap files and my memories has 80GB left?

Thanks

dusty_nv · March 29, 2024, 2:36am

Can you keep an eye on the memory usage in another terminal window by running tegrastats? My guess is that it is running low on memory. Training takes a lot of memory and is a stretch to get working in 4GB memory, so close those chrome tabs and everything. Ideally you would disable the Jetson’s desktop entirely for this step to save additional memory and processor utilization, and SSH into it from a PC.

github.com

dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md#mounting-swap

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="depthnet.md">Back</a> | <a href="pytorch-cat-dog.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Transfer Learning</sup></s></p>

# Transfer Learning with PyTorch

Transfer learning is a technique for re-training a DNN model on a new dataset, which takes less time than training a network from scratch.  With transfer learning, the weights of a pre-trained model are fine-tuned to classify a customized dataset.  In these examples, we'll be using the <a href="https://arxiv.org/abs/1512.03385">ResNet-18</a> and [SSD-Mobilenet](pytorch-ssd.md) networks, although you can experiment with other networks too.

<p align="center"><a href="https://arxiv.org/abs/1512.03385"><img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/pytorch-resnet-18.png" width="600"></a></p>

Although training is typically performed on a PC, server, or cloud instance with discrete GPU(s) due to the often large datasets used and the associated computational demands, by using transfer learning we're able to re-train various networks onboard Jetson to get started with training and deploying our own DNN models.  

<a href=https://pytorch.org/>PyTorch</a> is the machine learning framework that we'll be using, and example datasets along with training scripts are provided to use below, in addition to a camera-based tool for collecting and labeling your own training datasets.  

## Installing PyTorch

If you are [Running the Docker Container](aux-docker.md) or optionally chose to install PyTorch back when you [Built the Project](building-repo-2.md#installing-pytorch), it should already be installed on your Jetson to use.  Otherwise, if you aren't using the container and want to proceed with transfer learning, you can install it now:

``` bash

This file has been truncated. show original

system · April 24, 2024, 6:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nano B01 crashes while installing PyTorch Frameworks pytorch	1	1111	June 21, 2020
Hello AI World on Jetpack 4.4 Jetson Nano ai-training	10	1953	October 18, 2021
Retraining data for Object Detection Jetson Nano jetson-inference	2	331	October 15, 2021
PLEASE HELP: nvidia Jetson 2GB training fails - TypeError: __init__() missing 1 required positional argument: 'dtype' Jetson Nano ai-training	6	2387	March 2, 2022
Re-trained Pytorch Mask-RCNN inferencing in Jetson Nano Jetson Nano pytorch	2	1561	October 18, 2021
Jetson nano - train model for my own object detection Jetson Nano ai-training	11	4462	October 15, 2021
Transfer Learning without TLT Jetson Xavier NX ai-training	2	633	October 18, 2021
Jetson container package torch2trt on nano orin 8gb Developement kit Jetson Orin Nano containers	2	660	September 6, 2023
Train custom object detectio model Jetson Nano ai-training	12	3036	October 18, 2021
Jetson model training on WSL2 Docker container - issues and approach Jetson Nano docker	4	1471	October 15, 2021

Hello AI World Training Cat/Dog

Related topics