Jetson Nano 2GB Killed (Out Of Memory) During Re-Training

Hey guys!
I’m not sure if this is the right forum, but I just wanted to start off by saying I’m relatively new to the Linux community, and I’ve only had my Nano 2GB for 1 day. I already got the Hello AI world object detection going, updated the entire system (even overclocked it to 1.9ghz), and switched to LXDE. During the retraining, I even disable the entire GUI and just use puTTY

I’m trying to do the tutorial, where they retrain the neural network to detect fruits (jetson-inference/pytorch-ssd.md at master · dusty-nv/jetson-inference · GitHub)

Upon running: " python3 train_ssd.py --data=data/fruit --model-dir=models/fruit --batch-size=1 --workers=0 --epochs=2 " , the system freezes, tries to run it, and 99% of the time, It freezes, and 20 or 30 minutes later or so, I get “Killed” in the log, which is caused by Out Of Memory.

Yes, i’m using a SWAP File (20GB Large) on the fastest consumer microSD available (Sandisk Pro Plus C10, V30, A2)

Any ideas for running the command, or doing further optimization so I can actually do some training?

This was originally intented for Nano 4GB or Xavier NX (8GB), so you might be out of luck with a Nano 2GB.

To my knowledge Jetson GPU can not access virtual/swap memory.

The person in the tutorials is also using a 2GB Nano

And from my knowledge, the model training is CPU intensive, not GPU (I’m at 0% gpu usage during this)

EDIT: Here’s a screenshot from a fresh run. you can see I’m overclocked, low temps because of fan, but the memory usage it’s maxed out, and relying on swap space.

Then maybe ask this person which settings were used for the tutorial?

You are still running the Nano out of spec, disable the overclocking, reduce the swapspace to a reasonable size (1x or 2x the physical memory)…

I would, if the comments weren’t disabled on the video.

Out of curiosity, is there any downsides to having the swap memory as large as it is?

There are different opinions on swapfile size, you will find recommendations between 20% and up to 200% of physical memory, depending on the amount of RAM you have. I have never seen a recommendation to have a swapfile 10x the RAM size, like you are using. Your system will permanently swap in and out data, making your app and system super slow and eventually crash.

Try reducing batch size and number of training samples.

My batch size is 1, and I couldn’t figure out a way to get it lower ( .5 doesn’t work) I can only get about 400 samples reliably, but that’s no enough data to reliably train a model. What size do you recommend my swap file to be?

More specifically, what do you recommend my ZRam to be, and my SWAP to be?

This recommends 4GB swap file: jetson-inference/pytorch-transfer-learning.md at master · dusty-nv/jetson-inference · GitHub

Also check out number of workers/data loader threads: jetson-inference/pytorch-ssd.md at master · dusty-nv/jetson-inference · GitHub

If all of this does still does not work: main purpose of Jetson Nano is inference, especially the 2GB Nano model is too limited for serious training tasks…

Okay, I so disable ZRAM, enable 4GB Swapfile, got it, I’ll try that out.

Now, again, I’m super new and not used to the terminology, inference is just when the camera is running, and actually detecting objects, right?

So, if I want, I could train (models?) on my desktop, copy those to the Jetson, and just inference those models?

To make things simple, is it possible to run JetPack on my desktop so I have all those preinstall packages and librarys?

Now, another question (hopefully I can explain properly), in the Hello AI World Tutorial (running SSD-MobileNetV2), when I train models, is it just adding it to that existing model? If now, how do I do that? I’ve only found SSD-MobileNet-V1 availble for download in the .onnx / .py format.

Yea, I recall on the Nano 2GB when I made that video, it was necessary to disable ZRAM (because ZRAM actually consumes physical memory)

Yes you technically can, you just need to get the appropriate environment setup on your desktop. What I do is have an x86 machine that runs Ubuntu and has an NVIDIA GPU in it. Then I install the NVIDIA driver and the NVIDIA Container Runtime and run the NGC PyTorch container for x86. Then I am typically able to run the same PyTorch training scripts from Hello AI World. I have also seen people do it through Google Colab.

It starts with an pre-trained MobileNet classification model and essentially trains the SSD detector on the dataset that it is currently running on. So it will end up supporting the classes that are in the dataset you are currently training it on, not the classes that it was using before.

Thank you for the response dusty! I have a few more questions if you don’t mind answering them. 1.) Why are you comments disabled on the YouTube video(s)?
2.) Is there any benefit to re-enabling ZRam on the 2GB Model? 3.) I have a spare nvme drive in my computer that i run PopOS on. PopOS is Ubuntu based, so I’m guessing that would also work as well? 4.) So when I train my own model (Let’s use the fruits one for example) it will ONLY detect fruits? Or fruits + the previous objects that was pretrained with the model?

Sorry for the weird questions and wording, I’m still very new to all of this.

I think that is just the channel’s default setting, but in reality it’s probably a good thing because I’m not in the habit of checking the YouTube comments for questions (and don’t want to give the impression of them being ignored). Here on the forums and the GitHub repo are the best place to get help - welcome to the community!

The benefits of ZRAM (which is in-memory compressed RAM used as swap space) are that it can sometimes be a bit faster because it’s stored in-memory and not on physical storage (like SD card). So if you have less memory-intensive applications to run that don’t need so much swap space, there can be some benefit.

I haven’t tried PopOS, but if it is mostly the same as Ubuntu underneath, then it may be worth a shot.

That’s correct, it will only detect fruits (or whatever objects are in the current dataset that you are training on). You would need to expand your dataset to both fruits + previous objects if you wanted it to detect all of them.

That’s not to say that there aren’t some specialized methods in transfer learning or research papers that are able to train it like you ask, but it’s not a typical feature in the training scripts and isn’t how train_ssd.py works.

So If I wanted my own (better?) model, I would have to train a model that detects everything, such as fruits, humans, cars, keyboard, mice, monitors, etc… etc…, and use that as my model, unable to modify it (with the current train_ssd.py script)

Also, i’m so glad you answered. Even with my desktop disabled, only going through putty, i could not get through training an entire model without the process being killed. I’m going to try your recommendations and try again.

Yes, if you wanted a model that had all of those classes, you would need to combine them into one dataset and then train the model.

And actually, train_ssd.py does appear to support loading multiple datasets at once (saving you the step of merging them into one), but I haven’t tested this myself. https://github.com/dusty-nv/pytorch-ssd/blob/8ed842a408f8c4a8812f430cf8063e0b93a56803/train_ssd.py#L35

Thank you for the responses! Disabling ZRAM did fix my issues, and was able to train using all ~6000 images for the fruit example.

Some more questions, if you don’t mind answering them of course:
So, when running SSD-MobileNet-V2 in the helloAI world example, I get around 25 - 30 FPS. Is this a limitation SSD-MobileNet, or the hardware I’m running it on?

Do you have any personal pretrained models that you use that work great on the jetson, with lots of objects trained with good framerate?

What about facial recognition?

And last one (hopefully) could you provide me a good source (for beginners) for code that actually does something when something specific is detected? Like maybe play a sound or print out text when a cup is detected?

It’s a combination of the computational complexity of the network and the performance of the Nano.

I’d say that SSD-Mobilenet-v2 model provides the best balance between realtime framerate and lots of object classes (90 for that model). Although the less classes in the model, the higher performance it will be.

Here is example pseudocode that does something when a person is detected - you can substitute other classes for ‘person’ below. You can edit detectnet.py to be like so:

# https://github.com/dusty-nv/jetson-inference/blob/9bf55495d6a4ff1946dddbb0e81102d2d05f4952/python/examples/detectnet.py#L71
# this should go in the main processing loop at the line linked to above
for detection in detections:
    print(detection)
    
    # get the class name string of the detected object
    class_name = net.GetClassDesc(detection.ClassID)

    # do something if a person is detected
    if class_name == 'person':
         print('detected a person at coordinate ' + str(detection.Center))

There are some rudimentary Python docs for detectNet and the detections struct here:
https://rawgit.com/dusty-nv/jetson-inference/dev/docs/html/python/jetson.inference.html#detectNet

Again, thank you thank you thank you for all your help. You may have answered this, but is there any way for me to add more classes to the SSD-MobileMet-V2 model? I want to add more objects and classes of stuff in my room, such as my 3d printer, spools of filament, PC parts, (cpus, gpus, etc)

Or do I have to start from scratch?

That model actually comes from the TensorFlow model zoo from a previous time before we used PyTorch for training and before the PyTorch->ONNX->TensorRT workflow was working with SSD-Mobilenet. Regardless, with the current scripts you would need to retrain on all the classes that you wanted. The challenge is that those 90 classes from the original TensorFlow model come from MS COCO dataset, and support for that dataset hasn’t been added to the pytorch-ssd code. However the Open Images and Pascal VOC datasets which are supported in pytorch-ssd have many similar classes.

Okay, makes sense. So, I couldn’t easily train a model using Open-Images, and combine photos of images I take and add to it?

I guess a better description of what i’m trying to accomplish, is I wanted to:
1.) Download a pretrained (model?) online
2.) Add more images through the py_images_downloader
3.) Add my own class and images

whenever I want, so incase I forget something, or want to add something later, I can, without creating a whole new model from scratch.

In theory you could this, the challenge is that the format of the OpenImages dataset is rather complex IMO. But if you were to figure that part out, yes you could do that. I find the Pascal VOC dataset format to be a lot more straightforward and easy to work with (which is why I use it for custom datasets)

However there do appear to be various converters out there for OpenImages → Pascal VOC format:
https://www.google.com/search?q=openimages+to+pascal+voc

So in theory you could download what you want from OpenImages, convert it to Pascal VOC, and then more easily add your own images in Pascal VOC format. I use the CVAT tool online for annotating images in Pascal VOC format.