Xavier Shuts Down When Running Darknet Detector Demo

Hi everyone,

I am currently trying to find out the darknet yolo v2 detection performance of Xavier. I’ve downloaded and built the darknet. But whenever i start demo, it loads the network, shows 1 frame or so and shuts down immediately. I cant find any logs about this shutdown. It looks like electric cuts off. What can cause this shutdown?

Latest Darknet and Jetpack 4.1.1 are installed. No changes been made to kernel or OS.
jetson_clocks.sh script and “nvpmodel -m 0” command are executed.
Using “darknet detector demo” with Yolo v2 weights.

Anyone can get the weights from darknet website.
https://pjreddie.com/darknet/yolov2/

Exact command:

./darknet detector demo cfg/coco.data cfg/yolov2.cfg yolov2.weights <video file>

I am able to run CUDA samples and CPU stress tests with no problem. Power supply is enough for Xavier(12V 10A). I have also connected Xavier to an adjustable power supply. It peaked to 1.2A from 0.7A at the moment of execution and shutted down.

Any help is appreciated.
Thanks.

Try running “htop” (or anything which monitors memory use…“sudo apt-get install htop”) and see if it runs out of RAM.

try monitor temperature with e.g. xsensors
my subjective opinion is that xavier overheats in a way it it possible to make a cup of coffee over it when fan is not rolling

@Andrey1984, @linuxdev,

Thank you for your responses.

RAM usage and all temperature values are normal, fan spins. RAM usage reaches 4-5GB, temperature values are around 34C(it is a cold start).

You will probably need to use the serial console and monitor “dmesg --follow” and log what goes on just before and during the shutdown. A program like “gtkterm” works well and can log. The micro-B USB port provides serial console access…your host will see this as an FTDI serial UART. If you monitor “demsg --follow” from your host, and then connect the cable, you will see some serial devices mentioned. This has multiple devices, but the last one mentioned is the one to use. As an example:

[20744.152891] usb 3-3.4.2: new high-speed USB device number 10 using ehci-pci
[20744.232773] usb 3-3.4.2: New USB device found, idVendor=0403, idProduct=6011, bcdDevice= 8.00
[20744.232787] usb 3-3.4.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[20744.232795] usb 3-3.4.2: Product: Quad RS232-HS
[20744.232801] usb 3-3.4.2: Manufacturer: FTDI
[20744.236459] ftdi_sio 3-3.4.2:1.0: FTDI USB Serial Device converter detected
...
[20744.241180] usb 3-3.4.2: Detected FT4232H
[20744.241428] usb 3-3.4.2: FTDI USB Serial Device converter now attached to <i><b>ttyUSB8</b></i>

Assuming the first one in your case is also “ttyUSB8” (but in reality it probably isn’t), then you could use gtkterm like this:

sudo gtkterm -p /dev/ttyUSB8 -b 8 -t 1 -s 115200

You might not actually need to use “sudo”…if your user is a member of group “dialout” then sudo won’t be required.

Once your system is up and running clear the serial console display. Then start logging, followed by “dmesg --follow”. Finally, start running your program till it fails.

Once your program has a “.txt” file name extension you can attach it to an existing post. Hover your mouse over the quote mark in the upper right of the post and a paper clip icon will show up. Use the paper clip icon to attach the log file.

NOTE: You could perform a similar test with “sudo ~ubuntu/tegrastats” instead of “dmesg --follow”.

I doubt if this is an reproducible issue…? Would it shutdown every time?

@linuxdev,

Thanks for detailed instructions.

I followed your message step by step. Started logging, executed demo, Xavier shutted down and stopped the logging.
Created log files with dmesg and tegrastats(separate runs), i will attach them to this post. I couldn’t see any logs written after execution. (Program runs only for 2-3 sec.)

@WayneWWW

Thank you for your reply. Shutdown occurs every time i start the demo.
xavier_tegrastats.txt (6.6 KB)
xavier_dmesg.txt (67.3 KB)

Tegrastats didn’t show anything unusual.

For dmesg, I’m not sure if there was a typographic error in “dmesg --follow”, or if the log just shows you hitting the backspace to correct it. Notice this at the start of the log:

dmesg --floe[Ke[Kollow

I am guessing you typed “flo”, then hit the backspace twice, and then finished with “ollow”. If this is correct, then it seems logging stopped when I wouldn’t expect it to stop. On the other hand, the messages at the end of dmesg are related to GPU, e.g., railgate. I do not know if those messages are relevant or not.

Once the crash occurs, is serial console completely unresponsive? If it just goes from showing those messages to shutting off without shutdown in between, then I’d have to wonder if it is a hardware error. If not, then it is something unusual.

Btw, does this show all ok?

sha1sum -c /etc/nv_tegra_release

r1ch13r1ch,

Do you only have one AGX to do the test? If so, could you share steps of how to reproduce this issue?
It would be better if there is a simple way to reproduce issue.

Turns out that this is not happening if i use the pjreddies darknet. It happens with this version of darknet(which i generally use):
https://github.com/AlexeyAB/darknet
After this, i think this may not be a hardware problem. But how could a program can be able to do this without sudo privileges?

@linuxdev,

Yes, probably i did that. I will attach a clearly recorded log file to this message.
Serial console gets completely unresponsive. Power LED on board turns off.

This is the output of “sha1sum -c /etc/nv_tegra_release”, it shows all OK:

nvidia@jetson-0423418004169:~$ sha1sum -c /etc/nv_tegra_release
/usr/lib/aarch64-linux-gnu/tegra/libnvosd.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_camera.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_image.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvomx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_force.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnveventlib.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmedia.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_utils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libglx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvexif.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtx_helper.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketserver.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvscf.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnetstorehdfx.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_parser.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_contentpipe.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdla_utils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_gpucompute.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_tbc.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvos.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtnr.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_graphics.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvimp.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnet.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvphs.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvavp.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcapture.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite_video.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvfnetstoredefog.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_il.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvodm_imager.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvjpeg.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvargus_socketclient.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtvmr.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdla_compiler.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtracebuf.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdc.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_boot.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_generic.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvapputil.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcameratools.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcam_imageencoder.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnveglstream_camconsumer.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmm_utils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvomxilclient.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvwinsys.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvargus.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_spincircle.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvphsd.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnveglstreamproducer.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdla_runtime.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvll.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcolorutil.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvddk_2d_v2.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvtestresults.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcamv4l2.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcamerautils.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvcamlog.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvparser.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvddk_vic.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdla_core.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvdla_os.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvgov_ui.so: OK
/usr/lib/aarch64-linux-gnu/tegra/libnvmmlite.so: OK
/usr/lib/aarch64-linux-gnu/libv4l/plugins/libv4l2_nvvidconv.so: OK
/usr/lib/aarch64-linux-gnu/libv4l/plugins/libv4l2_nvvideocodec.so: OK
/usr/lib/xorg/modules/drivers/nvidia_drv.so: OK
/usr/lib/xorg/modules/extensions/libglx.so: OK

@WayneWWW,

Yes, i only have one AGX Xavier.

Please use the following commands to reproduce the issue:

  1. Download darknet and yolov2.weights:
git clone https://github.com/AlexeyAB/darknet
cd darknet
wget https://pjreddie.com/media/files/yolov2.weights
  1. Enable GPU, CUDNN and OpenCV parameters in Makefile:
GPU=1
CUDNN=1
CUDNN_HALF=0
OPENCV=1
...

Uncomment the line starting with “ARCH” below Jetson XAVIER:

# Jetson XAVIER
ARCH= -gencode arch=compute_72,code=[sm_72,compute_72]
  1. Build darknet:
make -j8
  1. Run demo:
./darknet detector demo cfg/coco.data cfg/yolov2.cfg yolov2.weights <video_file>

You need to provide a test video.

(Edited to add more information.)

xavier_dmesg_2.txt (66.1 KB)

Let me try it on my device.

Hi r1ch13r1ch,

Following your steps (#10) and run demo on Xavier/JetPack-4.1.1
The result looks good without shutdown issue.
What video are you testing? or please provide the test video to us.

Thanks!

Could you describe more about the shutdown?

Does it error as soon as you run the app? or just shutdown after a while?

Hi carolyuu,

Thanks for your answers.

@carolyuu, @WayneWWW

I uploaded two video files which you can download with the link below. “out.mp4” is the video that im using for demo. Other file is the footage of the shutdown. You can see Xavier below monitor. I dont touch the power supply, no interruption occurs. You can see the blue led of the adapter on the left.

Network allocates RAM succesfully. Shutdown occurs as soon as detection starts. Sometimes it shows 1-2 frame with bounding boxes drawn. Not behaving exact same on every try.

https://we.tl/t-Jg9qQh6wDX

Files will be deleted in 7 days.

Hi r1ch13r1ch,

It’s strangely!
Test with your “out.mp4” file, I still can’t get shutdown issue.

nvidia@jetson-0422418042113:~/darknet$ ./darknet detector demo cfg/coco.data cfg/yolov2.cfg yolov2.weights /home/nvidia/out.mp4

Could you please flash your Xavier board and try again?
Using the clean environment to double verify the issue.
Thanks!

I doubt it is a problem that needs RMA.

If you still want to dig in, you could also check which part of your app is causing the error.
Though I think RMA is the fastest way.

I reflashed the Xavier with Jetpack 4.1.1, result is the same.
Apparently there is a difference on my device. I will send it for RMA.

Thank you all.

Hi All,
We have noticed a similar problem with our Jetson Xavier.
Worked fine running some machine learning stuff on the desktop powered from the supplied 19V DC power supply, but when we moved it to our mobile robot platform to replace a Jetson TX1, we get the same kind of behaviour - boots up fine, then when the serious CUDA stuff starts it dies abruptly - not so much of a shutdown, as the white LED goes out and it needs the power button to be pressed to turn it back on.

On our mobile robot platform we are powering it from 12V DC from a Mini-ITX type M2 power supply, as as the TX1 was powered from. No voltage drops or problems with the power supply - it can easily deliver 6amps from the 28V cell Lithium Iron Phosphate battery packs.

The Jetson Xavier Developer Kit Carrier Board is supposed to be rated for a supply of 9V to 20V DC, so 12V should be fine - but it seems that it is not.

When we use a switching regulator (https://www.ebay.co.uk/itm/DC-DC-CC-CV-Buck-Converter-Step-down-Power-Module-7-32V-0-8-28V-12A-300W-N6W4-/282449457216) to convert the 28V battery voltage to 19V, the Xavier works perfectly under all conditions.

So my guess is that the issue is the same above - r1ch13r1ch was powering it from 12V, the other user from I assume something higher…

Hi, this looks more like a voltage drop case, did you measure the signal VIN_PWR_BAD? Please check the Power Loss Detection section in OEM DG.

Hi,

As far as I can see after a quick look, the VDDIN_PWR_BAD_N signal is not available on any of the carrier board’s external connectors, only on the Xavier module L55 pin and on the TPS3808G01 chip RST pin.

We have monitored the 12V DC input to the carrier board using a high end Agilent scope and there doesn’t appear to be any voltage drops / spikes on the supply.

Maybe raising the input voltage to 13V would fix it, but we have chosen 19V and it is working fine using the switching regulator I mentioned.