Problem converting caffemodel to tensorRT using AMPERE-based host GPU

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.6.0.8170
1.5.1.7815
1.5.0.7774
other

Host Machine Version
native Ubuntu 18.04
other

Hi there,

I’m trying to convert a caffemodel to a tensorRT optimized model using the provided tools

tensorRT_optimization
https://docs.nvidia.com/drive/driveworks-3.0/dwx_tensorRT_tool.html

And trtexec
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec

However, when I try to run it on my machine with an RTX 3080 (AMPERE GPU), the process takes a long time and then finally errors out with the following

…/builder/caskConvolutionTraits.cpp (381) - Cask Error in createConstants: 0 (initDeviceReservedSpace)
…/builder/caskConvolutionTraits.cpp (381) - Cask Error in createConstants: 0 (initDeviceReservedSpace)

This problem also happens with a relatively simple model like the supplied MNIST model.

Is there any way to fix this issue? I have another machine with an RTX 2080 which performs the conversion very quickly (< 20 seconds).

Thanks!

Hi @victorgpu,

Please provide the complete log, the steps, and the commands for our check and reproducing. Thanks.

Using the model files in /usr/local/driveworks/data/samples/dnn

Using trtexec

/usr/src/tensorrt/bin/trtexec --deploy=sample_mnist.prototxt --model=sample_mnist.caffemodel --output=prob --verbose
&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --deploy=sample_mnist.prototxt --model=sample_mnist.caffemodel --output=prob --verbose
[I] deploy: sample_mnist.prototxt
[I] model: sample_mnist.caffemodel
[I] output: prob
[I] verbose
[V] [TRT] Plugin Creator registration succeeded - GridAnchor_TRT
[V] [TRT] Plugin Creator registration succeeded - NMS_TRT
[V] [TRT] Plugin Creator registration succeeded - Reorg_TRT
[V] [TRT] Plugin Creator registration succeeded - Region_TRT
[V] [TRT] Plugin Creator registration succeeded - Clip_TRT
[V] [TRT] Plugin Creator registration succeeded - LReLU_TRT
[V] [TRT] Plugin Creator registration succeeded - PriorBox_TRT
[V] [TRT] Plugin Creator registration succeeded - Normalize_TRT
[V] [TRT] Plugin Creator registration succeeded - RPROI_TRT
[V] [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT
[I] Input "data": 1x28x28
[I] Output "prob": 10x1x1
[I] [TRT] Applying generic optimizations to the graph for inference.
[I] [TRT] Original: 9 layers
[I] [TRT] After dead-layer removal: 9 layers
[I] [TRT] After scale fusion: 9 layers
[I] [TRT] Fusing ip1 with relu1
[I] [TRT] After vertical fusions: 8 layers
[I] [TRT] After swap: 8 layers
[I] [TRT] After final dead-layer removal: 8 layers
[I] [TRT] After tensor merging: 8 layers
[I] [TRT] After concat removal: 8 layers
[I] [TRT] Graph construction and optimization completed in 0.0007404 seconds.
[I] [TRT] 
[I] [TRT] --------------- Timing scale(10)
[I] [TRT] Tactic 0 is the only option, timing skipped
[I] [TRT] 
[I] [TRT] --------------- Timing conv1(3)
[I] [TRT] 
[I] [TRT] --------------- Timing conv1(2)
[I] [TRT] Tactic 5 time 0.007168
[I] [TRT] Tactic 18 time 0.008192
[I] [TRT] Tactic 23 time 0.01024
[I] [TRT] Tactic 72 time 0.009216
[I] [TRT] Tactic 73 time 0.007168
[I] [TRT] Tactic 77 time 0.006144
[I] [TRT] Tactic 99 time 0.007168
[I] [TRT] Tactic 100 time 0.007168
[I] [TRT] Tactic 141 time 0.007072
[I] [TRT] Tactic 142 time 0.007168
[I] [TRT] Tactic 147 time 0.007168
[I] [TRT] 
[I] [TRT] --------------- Timing conv1(14)
[E] [TRT] ../builder/caskConvolutionTraits.cpp (381) - Cask Error in createConstants: 0 (initDeviceReservedSpace)
[E] [TRT] ../builder/caskConvolutionTraits.cpp (381) - Cask Error in createConstants: 0 (initDeviceReservedSpace)
[E] could not build engine
[E] Engine could not be created
[E] Engine could not be created
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --deploy=sample_mnist.prototxt --model=sample_mnist.caffemodel --output=prob --verbose

Using tensorRT_optimization

/usr/local/driveworks/tools/dnn/tensorRT_optimization --modelType=caffe                        --outputBlobs=prob                        --prototxt=sample_mnist.prototxt --caffemodel=sample_mnist.caffemodel
Initializing network optimizer on model sample_mnist.prototxt with weights from sample_mnist.caffemodel
Input "data": 1x28x28
Output "prob": 10x1x1

../builder/caskConvolutionTraits.cpp (381) - Cask Error in createConstants: 0 (initDeviceReservedSpace)
../builder/caskConvolutionTraits.cpp (381) - Cask Error in createConstants: 0 (initDeviceReservedSpace)

Dear @victorgpu,
Support for Ampere GPU require CUDA 11.0 or more. Note that the last DRIVE release also has CUDA 10.2 so we can not have any work around for this request.