Having an issue with my jetson nano devkit and was instructed to post here instead after submitting a support ticket case# 211016-000454. The problem I am experiencing is inferencing is producing the wrong result unless Dynamic Voltage and Frequency Scaling (DVFS) is disabled by running jetson_clocks. I am using the official SD image from nvidia, have done multiple wipes and reinstall of sd card and verified checksum after copying image to card to confirm transfer was both accurate and successful. When trying to run unmodified examples from the dusty-nv jetson getting started repo on GitHub the ImageNet examples fail to classify properly unless DVFS is disabled. I have tried both with and without any updates installed, tried both with a barrel jack power supply and micro usb power supply. When using barrel jack power supply, monitoring /sys/bus/i2c/drivers/ina3221x/6-0040/iio:device0/in_voltage0_input for voltage while executing the example ImageNet inferencing runs, voltage was within range min of 4984 mv, max of 5112 mv so it doesn’t seem likely to be a power supply issue.
If I manually disable DVFS with jetson_clocks then ImageNet example will ALWAYS produce an accurate result with the expected image match probability. When DVFS is not disabled (default setting) the result will nearly always be wrong (python example code normally fails with an exception related to no inference results see github issue below, c++ code simply writes no overlay at all to sample image instead of writing the match probability onto image). The only time the inferencing seems to work while DVFS is enabled is when other system activity had already push clocks/voltage up enough to get lucky though I don’t have any data to back this up and didn’t investigate this further once I discovered disabling DVFS was a workaround.
Could we reproduce this issue with the default jetson_inference example?
If yes, would you mind sharing the detailed steps of the working and non-working cases with us?
We would like to check this further to see if this is a possible issue.
Sure. So I have attached the entire detailed logs showing everything beginning with the initial headless setup, up to showing the default imagenet example being run with the default clocks settings and failing, and then using jetson_clocks to disable DVFS and run at max speed. Once jetson_clocks is run the inferencing will work until the default clocks settings are restored, at which point the inferencing will totally fail again. The logs are split up into ordered numbered parts in filename to hopefully make it easier to read and locate anything you may be looking for.
I will also include a shortened version in a code box below that contains what I thought was only the relevant output/commands to save you from needing to read through everything. If you find you want more detail about something in the shortened version you can refer to the full logs.
The main thing to notice is classification fails until jetson_clocks is run. Once run inferencing is successful until the default clocks are restored. In many places I used an echo command in the shell to include in the log either a note about the output or a command I was running.
Also of note is in the full detailed logs, there will be errors related to X11 such as [OpenGL] failed to open X11 server connection. This is because I ran this headless, but I can assure you that even if I connect a monitor keyboard and mouse the exact same problem with inferencing occurs without the X11 error which is just from it not being able to show the output image normally displayed in the popup window after running since the program is running headless. Connecting a monitor was actually one of the first things I tried when troubleshooting this before I figured out the real issue. Just wanted to mention the X11 thing so that time isn’t wasted chasing that down or wondering why its there.
all commands run immediately after initial headless setup from nvidia sd card image jetson-nano-jp46-sd-card-image.zip
SHA1 hash of zip file 522EC5C8064E9AC8A2E151D2BA806638596FD282
After format with SD Memory Card Formatter 5.0.1 image was written and verified with balena etcher
========================================================
jetson@jetson:~$ git clone --recursive https://github.com/dusty-nv/jetson-inference
....
jetson@jetson:~$ cd jetson-inference/
jetson@jetson:~/jetson-inference$ docker/run.sh
....
[jetson-inference] Models selected for download: 3 5 14 24 28 29 32 33 35 37 39 41
[jetson-inference] Downloading GoogleNet...
[jetson-inference] Downloading ResNet-18...
[jetson-inference] Downloading SSD-Mobilenet-v2...
[jetson-inference] Downloading MonoDepth-FCN-Mobilenet...
[jetson-inference] Downloading Pose-ResNet18-Body...
[jetson-inference] Downloading Pose-ResNet18-Hand...
[jetson-inference] Downloading FCN-ResNet18-Cityscapes-512x256...
[jetson-inference] Downloading FCN-ResNet18-Cityscapes-1024x512...
[jetson-inference] Downloading FCN-ResNet18-DeepScene-576x320...
[jetson-inference] Downloading FCN-ResNet18-MHP-512x320...
[jetson-inference] Downloading FCN-ResNet18-Pascal-VOC-320x320...
[jetson-inference] Downloading FCN-ResNet18-SUN-RGBD-512x400...
Downloading pytorch-ssd base model...
2b498531d852: Pull complete
Digest: sha256:119db75c0ad42a7380f3dfef8c05c8cd74afa86777e01efb24f0167dec133377
Status: Downloaded newer image for dustynv/jetson-inference:r32.6.1
root@jetson:/jetson-inference# cd jetson-inference/build/aarch64/bin
root@jetson:/jetson-inference/build/aarch64/bin# ./imagenet.py images/orange_0.jpg images/test/output_0.jpg
....
[image] loaded 'images/orange_0.jpg' (1024x683, 3 channels)
Traceback (most recent call last):
File "./imagenet.py", line 68, in <module>
class_id, confidence = net.Classify(img)
Exception: jetson.inference -- imageNet.Classify() encountered an error classifying the image
root@jetson:/jetson-inference/build/aarch64/bin# exit
exit
jetson@jetson:~$ echo storing default clocks settings for restoration later
storing default clocks settings for restoration later
jetson@jetson:~$ sudo jetson_clocks --store ~/default.clocks
jetson@jetson:~$cd jetson-inference/
jetson@jetson:~/jetson-inference$ docker/run.sh
....
root@jetson:/jetson-inference# echo with default settings and DVFS enabled classification will fail
with default settings and DVFS enabled classification will fail
root@jetson:/jetson-inference# cd build/aarch64/bin/
root@jetson:/jetson-inference/build/aarch64/bin# ./imagenet.py images/orange_0.jpg images/test/output_0.jpg
....
[image] loaded 'images/orange_0.jpg' (1024x683, 3 channels)
Traceback (most recent call last):
File "./imagenet.py", line 68, in <module>
class_id, confidence = net.Classify(img)
Exception: jetson.inference -- imageNet.Classify() encountered an error classifying the image
root@jetson:/jetson-inference/build/aarch64/bin# echo no class shown and classification error
no class shown and classification error
root@jetson:/jetson-inference/build/aarch64/bin# echo c program will not show any error but also will be missing classification on output text - output image will have no probability overlayed on top of it
c program will not show any error but also will be missing classification on output text - output image will have no probability overlayed on top of it
root@jetson:/jetson-inference/build/aarch64/bin# ./imagenet images/orange_0.jpg images/test/output_0.jpg
....
[TRT] device GPU, networks/bvlc_googlenet.caffemodel initialized.
[TRT] imageNet -- loaded 1000 class info entries
[TRT] imageNet -- networks/bvlc_googlenet.caffemodel initialized.
[image] loaded 'images/orange_0.jpg' (1024x683, 3 channels)
[image] saved 'images/test/output_0.jpg' (1024x683, 3 channels)
....
root@jetson:/jetson-inference/build/aarch64/bin# echo notice no classification shown in output
notice no classification shown in output
root@jetson:/jetson-inference/build/aarch64/bin# echo now to exit docker and disable DVFS with jetson_clocks
now to exit docker and disable DVFS with jetson_clocks
root@jetson:/jetson-inference/build/aarch64/bin# exit
exit
jetson@jetson:~/jetson-inference$ sudo jetson_clocks
jetson@jetson:~/jetson-inference$ docker/run.sh
....
root@jetson:/jetson-inference# cd build/aarch64/bin/
root@jetson:/jetson-inference/build/aarch64/bin# ./imagenet.py images/orange_0.jpg images/test/output_0.jpg
....
[image] loaded 'images/orange_0.jpg' (1024x683, 3 channels)
class 0950 - 0.966797 (orange)
[image] saved 'images/test/output_0.jpg' (1024x683, 3 channels)
....
root@jetson:/jetson-inference/build/aarch64/bin# echo now that DVFS disabled item is properly classified as class 0950 - 0.966797 orange
now that DVFS disabled item is properly classified as class 0950 - 0.966797 orange
root@jetson:/jetson-inference/build/aarch64/bin# echo c program will also properly function
c program will also properly function
root@jetson:/jetson-inference/build/aarch64/bin# ./imagenet images/orange_0.jpg images/test/output_0.jpg
....
[image] loaded 'images/orange_0.jpg' (1024x683, 3 channels)
class 0950 - 0.966797 (orange)
imagenet: 96.67969% class #950 (orange)
[image] saved 'images/test/output_0.jpg' (1024x683, 3 channels)
....
root@jetson:/jetson-inference/build/aarch64/bin# echo as shown output confirms classification also working in c program class 0950 - 0.966797 orange
as shown output confirms classification also working in c program class 0950 - 0.966797 orange
root@jetson:/jetson-inference/build/aarch64/bin# echo now to exit and restore default clocks settings to show it will fail once again
now to exit and restore default clocks settings to show it will fail once again
root@jetson:/jetson-inference/build/aarch64/bin# exit
exit
jetson@jetson:~/jetson-inference$ sudo jetson_clocks --restore ~/default.clocks
jetson@jetson:~/jetson-inference$ docker/run.sh
....
root@jetson:/jetson-inference/build/aarch64/bin#./imagenet.py images/orange_0.jpg images/test/output_0.jpg
....
[image] loaded 'images/orange_0.jpg' (1024x683, 3 channels)
Traceback (most recent call last):
File "./imagenet.py", line 68, in <module>
class_id, confidence = net.Classify(img)
Exception: jetson.inference -- imageNet.Classify() encountered an error classifying the image
root@jetson:/jetson-inference/build/aarch64/bin# echo as soon as default clocks restored with DVFS enabled classifcation fails once again on this jetson module
as soon as default clocks restored with DVFS enabled classifcation fails once again on this jetson module
No you didn’t miss anything, I’m not surprised you weren’t able to reproduce it. That’s why I was suspecting it must be a hardware issue, but hoping it wasn’t.
Just tried it with a different SD card. I used a different 64GB SanDisk SDXC card formatted with SD Association SD card formatter, then burned and verified with balena etcher on Linux. I used a different card reader, different computer, and different OS to try and eliminate those as possible factors, but unfortunately the result is still exactly the same. Even with the different SD card classification always fails until DVFS is disabled using jetson_clocks, it will work while DVFS is disabled and then as soon as default settings are restored with a reboot or using the save/restore feature of jetson_clocks it fails to classify once again using the default docker image and examples from github.
Unknown if anything else is broken but I can’t even successfully run the first example from the getting started with imagenet guide on the dusty-nv/jetson-inference github unless I run jetson_clocks first. Any other ideas on what it could be besides it being that the jetson devkit I received is faulty? That’s what I have been guessing since I originally posted the issue on github a little over a month ago, but would love for it to really be something else so I don’t have to deal with warranty return and exchange as I imagine that might take a long time and will be a pain.
My RMA replacement arrived today. For anyone else who may locate this issue later with the same or similar problem I wanted to post an update to confirm the issue and resolution. It was indeed a hardware defect with my Jetson nano devkit board. While I can’t say for sure what the problem was I can confirm that after receiving the RMA replacement today, using the same exact setup as before, everything is working properly without needing to disable DVFS. Thanks for the assistance, and hopefully this may be helpful to someone else experiencing the same problem.