Hello, Dear NVIDIA teams,
My hardware/software info is :
• Hardware Platform (Jetson Nano or NX)
**• DeepStream Version 5.0
• JetPack Version (4.3)
• TensorRT Version (7.1.3)
• NVIDIA GPU Driver Version (valid for GPU only)
I’m trying to implement the network video-caffe (https://github.com/chuckcho/video-caffe), which is basing on Facebook’s C3D and used for action video classification.
The network definition/structure is simple, please see https://github.com/chuckcho/video-caffe/blob/master/examples/c3d_ucf101/c3d_ucf101_deploy.prototxt
The input data is 16 colorful images (CHW is 3x112x112), I coded the network with(because of business proprietary, some details are omitted, but key statements are listed out):
loadWeights(mParams.weightsFileName.c_str(),mWeightsMap);
nvinfer1::ITensor* data = network->addInput(mParams.inputTensorNames[0].c_str(), nvinfer1::DataType::kFLOAT, nvinfer1::Dims{4,{3, 16, 112, 112},{}});
…
data = createLayerBlock(network,data,“conv1a”,“relu1a”,“pool1”,64,dims_kernel,dims_sp,dims_pooling,mWeightsMap[“conv1a”][0],mWeightsMap[“conv1a”][1]);
…
data = createLayerBlock(network,data,“conv5a”,“relu5a”,“pool5”,256,dims_kernel,dims_sp,dims_pooling,mWeightsMap[“conv5a”][0],mWeightsMap[“conv5a”][1]);
nvinfer1::IFullyConnectedLayer* fcLayer1 = network->addFullyConnected(*data,2048,mWeightsMap[“fc6”][0],mWeightsMap[“fc6”][1]);
fcLayer1->setName(“fc6”);
…
So, as defined in c3d_ucf101_deploy.prototxt , the “conv5a” convolution-relu-pooling block has 256 output channels, and the “fc6” FullyConnected layer’s num_output is 2048, considering the data is 16 images, the parameters count should be 256x16x2048 = 8388608, but TensorRT requires the parameter count is 18432 = 2048 x 9, if trying to copy all the 8388608 parameters in mWeightsMap[“fc6”][0] for “fc6” layer, then error happened:
fc6: kernel weights has count 8388608 but 18432 was expected
Could not compute dimensions for (Unnamed Layer* 15) [Fully Connected]_output, because the network is invalid.
Network validation failed.
mWeightsMap[“fc6”][0] has 8388608 parameters which is parsed from the weights file c3d_ucf101_iter_20000.caffemodel which is saved out by video-caffe model training with ucf101 dataset.
Please note other layer has no such issue, the parameter count required by TensorRT for each layer except “fc6” is exactly the same as that parsed out from .caffemodel, don’t know why TensorRT calculated wrongly for “fc6” layer.
If I forcibly only copy 18432 parameters for “fc6” layer, then the network can be created successfully, the final log messages output is :
Detected 1 input and 1 output network tensors.
I also printed out the shape of the output tensor of network, it is (256,101,1,1) , it looks like the FullyConnected Layer does not merge the 256 channels’ data while doing inner product operation.
So, although the network could be created successfully by only copying 18432 kernel weight parameters for “fc6” layer, the precision of classification is very bad. I made the network inference with data, each time the classification probability data and the corresponding element index/class label index in the network’s output tensor are almost the same, although I tested with the network inference with different data belonging to different class, e.g, testing the network inference by filling input data with 16 walking images and 16 swimming images, the classification result the network output is always walking. I guess this low inference precision is related to the wrong number of the kernel-weights parameters of the “fc6” layer and TensorRT’s FullyConnected layer doesn’t merge all the 256 channels while doing inner product.
We do want to use TRT API to implement C3D network on Nano and make product deliveries for large-scale deployment, as very much memory is saved if a network is implemented by TRT API, this is a big benefit. But now we are blocked here.
Dear NVIDIA guys, could you please check why TensorRT’s FC layer cannot support 3D network correctly? thanks in advance.