Volta AMI with DIGITS 6.0 Container - Can't Import Custom Models

Hello All,

First, as a disclaimer, this is technically a Jetson question (and I will post this as an issue on jetson-inference as well) but I’m asking about it here since it’s a problem I have specifically because we are trying to use the AWS Volta AMI + DIGITS 18.01 Container as our training platform.

After downloading a model snapshot from a successfully trained model, I’m trying to follow the instructions here (https://github.com/dusty-nv/jetson-inference) to deploy this DetectNet model onto the Jetson TX2. The Jetson TX2 was upgraded out of the box to the JetPack 3.2 developer preview that includes TensorRT 3.0.0-RC2 per the instructions here (http://docs.nvidia.com/jetpack-l4t/index.html). When trying to run the detectnet-console sample on radomly selected image from the original dataset I’m getting many errors at the “building CUDA engine” stage that look like:

[GIE] inception_3a/3x3: kernel weights has count 1 but 110592 was expected
[GIE] inception_3a/5x5_reduce: kernel weights has count 1 but 3072 was expected

etc …

Based on the several of the issues already created about this on the jetson-inference repo (#84 https://github.com/dusty-nv/jetson-inference/issues/84 , #99 https://github.com/dusty-nv/jetson-inference/issues/99 , #123 https://github.com/dusty-nv/jetson-inference/issues/123 ) it seems this is typically an issue of trying to import a model trained using NVCaffe 0.16+ (0.16.4 in my case) instead of 0.15.

One thing I don’t understand about that: I have TensorRT 3-RC2 installed, which according to the docs here (http://developer2.download.nvidia.com/compute/machine-learning/tensorrt/secure/3.0/ga/TensorRT-Release-Notes-3.0.2.pdf?XZzFNq3ErTn_lkseUUzSgeGMrxRY4GHuPmGOug6OmtsRlEGVEkzXV9gbleHhSikBh6EWoaNA-a5VqnXQmUBNKaXulb6OsXVwX0VBfFumXZT5aWwlrBdaUrPIRvdVSwHPGZzLRribhgbANRzvhB4rSOYPK9PEBqGof3MuFoCMP619OjumRZPAJBJdvnwinQQ) should support NVCaffe 0.16 model parsing as of TensorRT 3-RC1. Is there a work around to this incompatibility right now, or could I somehow have the wrong version of TensorRT installed? Obviously there’s the option of standing up a computer with DIGITS 5 and NVCaffe 0.15 to retrain our model, but that completely defeats the purpose of using the Volta AMI as a scalable workflow.

This has me stuck dead in the water. Any recommendations or workarounds are very very appreciated!!

Thanks

  • R