Problems Training Models

I am having problems on my jetson with training my models:
This is the error that I am getting:(base) jetson@jetson-desktop:~/Downloads/jetson-inference/python/training/classification$ python3 train.py --model-dir=myModel ~/Downloads/jetson-inference/myTrain
Use GPU: 0 for training
=> dataset classes: 3 [‘Black Nut’, ‘Leaf’, ‘Pecan’]
=> using pre-trained model ‘resnet18’
=> reshaped ResNet fully-connected layer with: Linear(in_features=512, out_features=3, bias=True)
Traceback (most recent call last):
File “/home/jetson/Downloads/jetson-inference/python/training/classification/train.py”, line 506, in
main()
File “/home/jetson/Downloads/jetson-inference/python/training/classification/train.py”, line 135, in main
main_worker(args.gpu, ngpus_per_node, args)
File “/home/jetson/Downloads/jetson-inference/python/training/classification/train.py”, line 227, in main_worker
torch.cuda.set_device(args.gpu)
File “/home/jetson/mambaforge/lib/python3.9/site-packages/torch/cuda/init.py”, line 311, in set_device
torch._C._cuda_setDevice(device)
AttributeError: module ‘torch._C’ has no attribute ‘_cuda_setDevice’

I may not be giving enough information. Please help me if you can.
Thanks,
Brent

Hi,

May I know how do you run the training job?
Thanks.

It also looks like you are using a different build of PyTorch built for Python 3.9 that was installed through Conda.

Instead, please install one of the PyTorch pip wheels from this post (these were built with CUDA/cuDNN enabled) or use the l4t-pytorch container.

Should I uninstall PyTorch before reinstalling?

Thanks for your help,
Brent

I’m not very familiar with using Conda (especially on Jetson), but yes I think so as to not confuse the packages.

Everything works good until i get to here:

base) jetson@jetson-desktop:~$ pip3 install numpy torch-1.8.0-cp36-cp36m-linux_aarch64.whl
ERROR: torch-1.8.0-cp36-cp36m-linux_aarch64.whl is not a supported wheel on this platform.

Brent

Tried this also:

(base) jetson@jetson-desktop:~$ pip3 install numpy torch-1.10.0-cp36-cp36m-linux_aarch64.whl
WARNING: Requirement ‘torch-1.10.0-cp36-cp36m-linux_aarch64.whl’ looks like a filename, but the file does not exist
ERROR: torch-1.10.0-cp36-cp36m-linux_aarch64.whl is not a supported wheel on this platform.

I am running L4t32.6.1 JetPack 4.6

Brent

Hi @my65gto2, did you download the wheel?

What does your pip3 --version show? It should be for Python 3.6 because my PyTorch wheels are compiled for Python 3.6.

python3.9

Thanks you!

Also, if go back to python3.6, would I need to uninstall python3.9?
Brent

python3.9

Thanks you!

I don’t believe so, it may be that you just need to run pip3.6 and python3.6 or restore the pip3/python3 symbolic links to point to Python 3.6 versions instead.

Hi Dusty,
I did a complete re-install of jetpack. I have Pytorch up and running. I am having a problem with a program that I am running. This is the part I am getting an error:

net=jetson.inference.imageNet(‘alexnet’,[’–model= /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx’,’–input_blob=input_0’,’–output_blob=output_0’,’–labels= /home/jetson/Downloads/jetson-inference/myTrain/labels.txt’])

This is what I get when running the program:

jetson@jetson-desktop:~/Desktop/pyPro$ /usr/bin/python3 /home/jetson/Desktop/pyPro/NVIDIA/deepLearning-10.py
[ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (1757) handleMessage OpenCV | GStreamer warning: Embedded video playback halted; module source reported: Could not read from resource.
[ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (886) open OpenCV | GStreamer warning: unable to start pipeline
[ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (480) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created
jetson.inference – imageNet loading network using argv command line params

imageNet – loading classification network model from:
– prototxt (null)
– model /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx
– class_labels /home/jetson/Downloads/jetson-inference/myTrain/labels.txt
– input_blob ‘input_0’
– output_blob ‘output_0’
– batch_size 1

[TRT] TensorRT version 8.0.1
[TRT] loading NVIDIA plugins…
[TRT] Registered plugin creator - ::GridAnchor_TRT version 1
[TRT] Registered plugin creator - ::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - ::NMS_TRT version 1
[TRT] Registered plugin creator - ::Reorg_TRT version 1
[TRT] Registered plugin creator - ::Region_TRT version 1
[TRT] Registered plugin creator - ::Clip_TRT version 1
[TRT] Registered plugin creator - ::LReLU_TRT version 1
[TRT] Registered plugin creator - ::PriorBox_TRT version 1
[TRT] Registered plugin creator - ::Normalize_TRT version 1
[TRT] Registered plugin creator - ::ScatterND version 1
[TRT] Registered plugin creator - ::RPROI_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
[TRT] Could not register plugin creator - ::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - ::CropAndResize version 1
[TRT] Registered plugin creator - ::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - ::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - ::Proposal version 1
[TRT] Registered plugin creator - ::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - ::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - ::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - ::Split version 1
[TRT] Registered plugin creator - ::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - ::InstanceNormalization_TRT version 1
[TRT] detected model format - ONNX (extension ‘.onnx’)
[TRT] desired precision specified for GPU: FASTEST
[TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 234, GPU 3874 (MiB)
[TRT] native precisions detected for GPU: FP32, FP16
[TRT] selecting fastest native precision for GPU: FP16
[TRT] attempting to open engine cache file .1.1.8001.GPU.FP16.engine
[TRT] cache file not found, profiling network model on device GPU

error: model file ’ /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx’ was not found.
if loading a built-in model, maybe it wasn’t downloaded before.

Run the Model Downloader tool again and select it for download:

$ cd /tools
$ ./download-models.sh

[TRT] failed to load /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx
[TRT] imageNet – failed to initialize.
jetson.inference – imageNet failed to load built-in network ‘alexnet’
Traceback (most recent call last):
File “/home/jetson/Desktop/pyPro/NVIDIA/deepLearning-10.py”, line 17, in
net=jetson.inference.imageNet(‘alexnet’,[’–model= /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx’,’–input_blob=input_0’,’–output_blob=output_0’,’–labels= /home/jetson/Downloads/jetson-inference/myTrain/labels.txt’])
Exception: jetson.inference – imageNet failed to load network

jetson@jetson-desktop:~/Desktop/pyPro$

I have checked the path many times to see if I have it right. It looks good. I am just trying to recognize one model. It seemed to train ok. The conversion from Pytorch to onnx_export.py seemed to run good.
I am running Pytorch 1.10.0
L4t 32.6.1
TensorRT: 8.0.1.6

I hope that I have given you enough information to go on.

Thank you for your time & help,
Brent

Hi @my65gto2, the error reports that it can’t find this file, so I recommend to double/triple-check the path to make sure that it’s correct. What happens if you do:

ls -ll /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx

This is what I get:

jetson@jetson-desktop:/$ ls -ll /home/jetson/Downloads/jetson-inference/python/training/classification/myModel
total 218528
-rw-rw-r-- 1 jetson jetson 89513359 Dec 16 21:56 checkpoint.pth.tar
-rw-rw-r-- 1 jetson jetson 89513359 Dec 16 21:50 model_best.pth.tar
-rw-rw-r-- 1 jetson jetson 44744191 Dec 17 11:05 resnet18.onnx

Brent

I think I just noticed a couple extra spaces in your arguments:

[’–model= /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx’,’–input_blob=input_0’,’–output_blob=output_0’,’–labels= /home/jetson/Downloads/jetson-inference/myTrain/labels.txt’]

(in between --model= and the path, and also --labels and the path)

Can you try it with this instead:

['--model=/home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx','--input_blob=input_0', '--output_blob=output_0', '--labels=/home/jetson/Downloads/jetson-inference/myTrain/labels.txt']

If that doesn’t work, can you try seeing if the imagenet program can load it like so:

imagenet --model=/home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=/home/jetson/Downloads/jetson-inference/myTrain/labels.txt images/cat_0.jpg

You know Dusty, sometimes these problems are like a needle in a haystack. I misspelled classification. Thank you for all of your time and help! I am trying to program and build a pick and place robot, any direction that you might have me check out?

Thanks again,
Merry Christmas,
Brent

dusty_nv Moderator
December 20

I think I just noticed a couple extra spaces in your arguments:

[’–model= /home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx’,’–input_blob=input_0’,’–output_blob=output_0’,’–labels= /home/jetson/Downloads/jetson-inference/myTrain/labels.txt’]

(in between --model= and the path, and also --labels and the path)

Can you try it with this instead:

['--model=/home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx','--input_blob=input_0', '--output_blob=output_0', '--labels=/home/jetson/Downloads/jetson-inference/myTrain/labels.txt']

If that doesn’t work, can you try seeing if the imagenet program can load it like so:

imagenet --model=/home/jetson/Downloads/jetson-inference/python/training/classfication/myModel/resnet18.onnx --input_blob=input_0 --output_blob=output_0 --labels=/home/jetson/Downloads/jetson-inference/myTrain/labels.txt images/cat_0.jpg

Ah sorry, I had missed that too! Glad that you got it working :)

I recall there being some robotic arms on the Jetson Projects page - you might want to check these links out:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.