Building CUDA engine never completes

Greetings…

I’ve been attempting to run through this tutorial with my newly acquired jetson nano dev board.
https://github.com/dusty-nv/jetson-inference/blob/master/docs/imagenet-console-2.md

I’m powering the board via barrel jack connector and made sure the nvprofile was set to 0.

When running ./imagenet-console orange_0.jpg output_0.jpg, I’ve noticed that I get to the place where it says building CUDA engine, this could take a few minutes… nothing happens after. Never completes.

I’m not getting a crash or anything like I’ve heard other people complain about… but I’ve let this run hours and it never actually succeeds. (hitting ctrl+c does eventually break out to console)

Logs:

imagenet-console
  args (3):  0 [./imagenet-console]  1 [orange_0.jpg]  2 [output_0.jpg]


imageNet -- loading classification network model from:
         -- prototxt     networks/googlenet.prototxt
         -- model        networks/bvlc_googlenet.caffemodel
         -- class_labels networks/ilsvrc12_synset_words.txt
         -- input_blob   'data'
         -- output_blob  'prob'
         -- batch_size   2

[TRT]  TensorRT version 5.0.6
[TRT]  detected model format - caffe  (extension '.caffemodel')
[TRT]  desired precision specified for GPU: FASTEST
[TRT]  requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]  native precisions detected for GPU:  FP32, FP16
[TRT]  selecting fastest native precision for GPU:  FP16
[TRT]  attempting to open engine cache file networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  cache file not found, profiling network model on device GPU
[TRT]  device GPU, loading networks/googlenet.prototxt networks/bvlc_googlenet.caffemodel
[TRT]  retrieved Output tensor "prob":  1000x1x1
[TRT]  retrieved Input tensor "data":  3x224x224
[TRT]  device GPU, configuring CUDA engine
[TRT]  device GPU, building FP16:  ON
[TRT]  device GPU, building INT8:  OFF
[TRT]  device GPU, building CUDA engine (this may take a few minutes the first time a network is loaded)

Hmm, it should only take up to 5 minutes or so on Nano to build the CUDA engine for Googlenet. Can you try running “sudo tegrastats” in the background and monitoring the system for activity?

If you continue to have the issue, you may want to try re-cloning the repo, or trying a fresh SD card image.

I deleted the cloned folder and re-initialized using same instructions, and while tegrastats is showing ~40% cpu utilization its still seemingly hanging indefinitely… or at least for extended periods of time.

I’ll try a new SD card image now I guess.

Just ran through this for the first time myself. It’s taking 12m45s to complete “building CUDA engine”. More than a few minutes! The other thing I ran into was that the power consumption spiked when the app finally ran. Nano crashed. After upgrading to a 3A 5V supply, the app runs OK (after another 12m45s delay). Edit: Just noticed that OP was building for Googlenet. I was for building for facedetection, but still much slower than expected.

OK, yes, the object detection networks can take longer to optimize. Try running this beforehand to make sure you are in 10W mode and with your clocks maximized:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Note that the delay to build the CUDA engine only occurs the first time you run a particular model. On subsequent runs, it should only take a couple seconds to load the already-optimized model.