Deploying TensorRT model on Jetson TX2

I am working on deploying a model to Jetson TX2. I have to use the TensorRT C++ API to manually create a network that can be executed by TensorRT. In the TensorRT API description such as the following, it takes Weights preallocated on CPU. In my use case, I only have weights preallocated on GPU. So can the API accept those weights on GPU? Thanks.

IConvolutionLayer* nvinfer1::INetworkDefinition::addConvolution (
ITensor & input,
int nbOutputMaps,
DimsHW kernelSize,
Weights kernelWeights,
Weights biasWeights
)

Hi,

May I know why you want to create a layer with GPU weight?

Suppose you have a CPU weight and copy it into GPU memory.
It’s recommended to feed CPU weights into TensorRT and let it choose a memory location based on GPU architecture.
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/c_api/classnvinfer1_1_1_i_network_definition.html#a29fb055009bb117be0e957cd1bce44a9

Thanks.

Hi,

In my use case, I have a graph with operators that are not supported by TensorRT. So I need to embed the TensorRT engine in the graph executor of the framework to execute just a subgraph consisting of TensorRT-compatible operators. The TensorRT engine is created in the first forward call when the subgraph execution is invoked by the framework’s executor, at which moment, I only have weights allocated on GPU. In order to create a TensorRT engine, I have to copy the weights from GPU to CPU which is troublesome and not efficient. Can TensorRT provide APIs accepting weights allocated on GPU to be in favor of subgraph use cases?

Thanks.

Hi,

We don’t have API to create a layer from GPU weights directly.
TensorRT wants to allocate a memory based on architecture and layer type.

Suppose you are not using TensorRT API but Uff parser with plugin, right?
If yes, it’s recommended to wait for our next release since the plugin factory approach is deprecated.

Thanks

I’m using the C++ API, not parser. Having an API accepting weights pre-allocated on GPU does not prevent TensorRT from allocating memory based on architecture and layer type. TensorRT makes copies of weights anyway. Whether the weights are pre-allocated on CPU/GPU should make no difference. It’s just not a good user experience of not taking weights allocated on GPU. Integrating TensorRT with any DL framework would face this inconvenience.

Hi,

We think your request is make sense and passed it to the TensorRT team.
This request will be tracked and prioritized internally.

Thanks for your feedback.

Thanks for your action and answers.