DLI Getting Started With AI on the Jetson Nano - no progress when training thumbs example

thomas.may · August 28, 2019, 5:31am

I am working through the DLI Intro course on Getting Started with AI on the Jetson nano (classification_interactive.ipynb). I have successfully worked through all of the steps up to training the thumbs up / thumbs down version of RESNET. I have taken 30 images for each thumbs up and down and confirmed the files exist. When I select the train button, the train and evaluate buttons “grey out”. The progress bar will advance after a very long time - hours, but then stop at about 20%. The browser is chrome running on an iMac. The nano has the Intel M2 WiFi card installed but is hard wired ethernet and is running in headless mode. I am using the Adafruit 5V 4A power supply. There is continuous ethernet activity.

I have tried the following with no success:

updating all of the packages for the DLI Nano image
turning on the fan at full speed (sudo sh -c 'echo 255 > /sys/devices/pwm-fan/target_pwm' )
reducing the power profile ( sudo nvpmodel -m 1 )

At this point, I am stuck and would greatly appreciate any help.

d.a.mahabiersing · August 28, 2019, 5:48am

I had this issue too. Probably you have some zero sized images in your batch and the training gets stuck on that. Open a console and check the size of all images e.g. with ls -la. You have to delete them and replace them with new ones i.e. just capture new ones so that you have again 30 images.

thomas.may · August 28, 2019, 2:20pm

That was it - thank you very much!!!

I had checked the files through the Jupyter Lab interface and they were there, but never thought to look at the file sizes. Six files in the thumbs down folder were zero bytes! I deleted and replaced those images and the training started within 30 seconds and all 10 epochs completed in just a few minutes. Thumbs UP and down detecting working very well, even with a cluttered background in the image.

ak-nv · August 28, 2019, 6:36pm

We are trying to figure out this issue, meanwhile here is quick solution:

Check if you have empty files in directory:

$find <path_to_directory> -empty -type f

Delete Empty files in directory:

$find <path_to_directory> -empty -type f -delete

agaytan · January 27, 2020, 12:36am

For the emotions part of the Project

Path name for the images is /home/dlinano/classification/emotions_A

once in the directory

type “find ./ -empty -type f” to find all the zero size files
type “find ./ -empty -type f -delete” to delete all the zero size files

Thanks everyone this worked for me.

BTW if you want to start over with this project you can delele the png files in the
emotions_A none, happy, sad, angry directories

rdlarson91 · March 5, 2020, 6:16pm

Hi, I have a related question. While training the thumbs up / thumb down example I mistakenly entered a number of thumbs up images to the thumbs down data set. Now everything is saying thumbs up in live mode. I did not save the model, but even if I close out Jupyter and shut down the nano and come back up it, the data is still there. I don’t see where the images are stored. What I would like to do is start completely over (and be more careful), to retrain with new data.

thomas.may · March 5, 2020, 6:57pm

Hi rdlarson91. It look like you should be able to just delete all of, or even just the wrong images in the photos folder. For the thumbs classification exercise, they should be in either:

/home/dlinano/nvdli-nano/classification/thumbs_A/thumbs_down
/home/dlinano/nvdli-nano/classification/thumbs_B/thumbs_down

depending on which run you were doing. Shutdown the kernel for the classification_interactive notebook (if the notebook is running), delete the photos and then restart the kernel or reopen the notebook. You can then retrain the model on the new / corrected images. Hope that helps.