UserWarning: This overload of nonzero is deprecated: (Extremely slow model training)

Hi I’ve just started having an issue with the classification_interactive notebook in the dli nano course.
I tried to train a Resnet34 and an Inception_v3 in there for 6 categories, with a size of 210 images in each. I know we’re not really supposed to use the Nano 2GB for training, but I’ve tried training it before with a Resnet34 and 300 images per category, I managed to get results from that and the training went a lot faster.
However this time, I am now getting this warning regardless of how I run the notebook. It is resulting in incredibly slow training of the model. In the case of the inception_v3 it’s taking an hour just to make 10% progress in a single epoch.
I’ve read that it’s a bug in pytorch that has come up frequently, but I don’t quite understand some of the solutions being provided.
Some forums recommend doing this:

But I have no idea where I would implement this bug fix inside the DLI course notebook.
Please help me figure out what to do here.

Hi,

Since Nano 2GB has limited resource, it is possible that the slowness comes from memory or storage shortage.
Would you mind to monitor the device with tegrastats to check it first?

$ sudo tegrastats

Thanks.

Hello I am facing the same issue with my Nano 4GB. Could it be an issue with the notebook?

Hi,

The notebook is trying to run a PyTorch model.
Have you checked the memory usage with tegrastats?

Sometime the slowness is from the memory shortage, since the device need to read/write data frequently.

Thanks.