Not able to run custom DNN on Jetson nano

Hi,

Trying to build and run a custom DNN on Jetson nano but facing incompatibility issues.
The DNN was trained with Pytorch and converted to ONNX format before loading. Customized the “Hello AI world” to run different network (currently it only supports some predefined set of networks)

Details:
Pytorch version: 1.2.0
ONNX: IR version – 0.0.4, opset version – 9
Number of output classes: 2 (softmax layer present)
Labels.txt: Updated with 2 class descriptions

Error:
“imageNet – din’t load expected number of class descriptions (2 of 1)”
“imageNet – failed to load synset class descriptions (2/2 of 1)”

Network definition:
class Network(nn.Module):
def init(self, in_channels = 3, num_classes = 2):
super(Network, self).init()
self.conv1 = nn.Conv2d(3, 32, 3)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(32, 64, 3)
self.conv3 = nn.Conv2d(64, 128, 3)
self.conv4 = nn.Conv2d(128, 128, 3)
self.fc1 = nn.Linear(128 * 7 * 7, 512)
self.fc2 = nn.Linear(512, num_classes)
self.sm = nn.Softmax(dim = 1)

def forward(self, x):
x = self.pool1(F.relu(self.conv1(x)))
x = self.pool1(F.relu(self.conv2(x)))
x = self.pool1(F.relu(self.conv3(x)))
x = self.pool1(F.relu(self.conv4(x)))
x = x.view(-1, 128 * 7 * 7)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x = self.sm(x)
return x

ONNX conversion:
import torch.onnx

batch_size = 1
x = torch.randn(batch_size, 3, 150, 150, device=‘cuda’)

torch.onnx.export(net,
x,
“my_classifier.onnx”,
verbose=True
)

Thanks

Hi,

“imageNet – din’t load expected number of class descriptions (2 of 1)”

Have you updated the class number first?
It looks you only have one class output.

Thanks.

Hi,

My assumption was that with the labels file updated with two classes and the ONNX model file showing 2 classes, it should be taken as 2 classes.

Can you please let me know where else the number of classes is to be updated?

Thanks

The imageNet error is saying that it loaded 2 class names from your labels.txt file, but the network model itself only supports 1 class.

When you ran train.py to train the network, it should have printed out the classes it found near the beginning of the log. It seems to only be finding one class when training. You may want to check that your dataset directory structure looks like here:

Hi,

Yes the network was trained exactly that way with 2 directories for the 2 classes on train, validation and test.
But the network was not trained using train.py in “Hello AI World”, it was trained with Pytorch in Google colab.

I have also attached the ONNX graph snapshot (output dimension shows 2 classes).

Thanks

Further details…

The following customisation was done to imagenet module

bool imageNet::init( imageNet::NetworkType networkType, uint32_t maxBatchSize,
precisionType precision, deviceType device, bool allowGPUFallback )
{
//Updated with
else if( networkType == imageNet::N_CLASSIFIER ) {
return init( NULL, “networks/n_classifier/n_classifier.onnx”, NULL, “networks/n_classifier/labels.txt”, IMAGENET_DEFAULT_INPUT, “softmax”, maxBatchSize, precision, device, allowGPUFallback );
}
}

enum NetworkType
{
	CUSTOM,        /**< Custom model provided by the user */
	ALEXNET,		/**< AlexNet trained on 1000-class ILSVRC12 */
	GOOGLENET,	/**< GoogleNet trained 1000-class ILSVRC12 */
	GOOGLENET_12,	/**< GoogleNet trained on 12-class subset of ImageNet ILSVRC12 from the tutorial */
	RESNET_18,	/**< ResNet-18 trained on 1000-class ILSVRC15 */
	RESNET_50,	/**< ResNet-50 trained on 1000-class ILSVRC15 */
	RESNET_101,	/**< ResNet-101 trained on 1000-class ILSVRC15 */
	RESNET_152,	/**< ResNet-50 trained on 1000-class ILSVRC15 */
	VGG_16,		/**< VGG-16 trained on 1000-class ILSVRC14 */
	VGG_19,		/**< VGG-19 trained on 1000-class ILSVRC14 */
	INCEPTION_V4,	/**< Inception-v4 trained on 1000-class ILSVRC12 */
	N_CLASSIFIER, //Added
};

new labels were added (2 classes) here “networks/n_classifier/labels.txt”

Even though the model has 2 classes, ONNX parser does not parse it correctly, the following message is captured while parsing the model

“[TRT] binding to output 0 softmax dims (b=1 c=1 h=1 w=1) size=4”

Thanks

Hi,

Sorry for the late update.

If the output tensor of TensorRT doesn’t align to your model, maybe you are using the incorrect model.
Please noticed that we will serialize TensorRT engine for acceleration by default.
So please make sure your engine file is created with the new model first.

Thanks.

Hi,

Thanks for your response. Unfortunately, still I could not run ,my ONNX on nano.

I observed that the engine file is getting created.

Looking at imagenet .cpp I can see that the number of output classes are taken this way

/*
* load synset classnames
*/
mOutputClasses = DIMS_C(mOutputs[0].dims);

This is getting reported as 1 even though the original ONNX file had 2 classes and also 2 labels were added in labels file.

I have also attached the engine creation log.
run_log.txt (138.4 KB) Thanks

Hi,

I was able to identify this problem - the input and output layer names were not specified correctly while initializing the network. Default input and output names specified in imagenet.h are “data” and “prob” that was modified with the actual names on the ONNX.

Thanks for your time.

Can someone tell me how to get this snapshot ? what package to use ?
It’s kinda neat :)