Hi. As mentioned, I ran the following command with -d flag specified:
./tao-converter -k tlt_encode -o output_cov/Sigmoid,output_bbox/BiasAdd -p image_input,1x3x480x640,4x3x480x640,16x3x480x640 …/repos/lpr/res/dnn/lpd/models/resnet18_detector.etlt -t fp16 -e …/repos/lpr/res/dnn/lpd/models/tensorrt/resnet18_detector_lpd.engine -d 3,480,640
I got this output:
[INFO] [MemUsageChange] Init CUDA: CPU +203, GPU +0, now: CPU 221, GPU 2264 (MiB)
[INFO] [MemUsageSnapshot] Builder begin: CPU 237 MiB, GPU 2290 MiB
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] conv1/convolution + activation_1/Relu
[INFO] [GpuLayer] block_1a_conv_1/convolution + block_1a_relu_1/Relu
[INFO] [GpuLayer] block_1a_conv_shortcut/convolution
[INFO] [GpuLayer] block_1a_conv_2/convolution + add_1/add + block_1a_relu/Relu
[INFO] [GpuLayer] block_1b_conv_1/convolution + block_1b_relu_1/Relu
[INFO] [GpuLayer] block_1b_conv_2/convolution + add_2/add + block_1b_relu/Relu
[INFO] [GpuLayer] block_2a_conv_1/convolution + block_2a_relu_1/Relu
[INFO] [GpuLayer] block_2a_conv_shortcut/convolution
[INFO] [GpuLayer] block_2a_conv_2/convolution + add_3/add + block_2a_relu/Relu
[INFO] [GpuLayer] block_2b_conv_1/convolution + block_2b_relu_1/Relu
[INFO] [GpuLayer] block_2b_conv_2/convolution + add_4/add + block_2b_relu/Relu
[INFO] [GpuLayer] block_3a_conv_1/convolution + block_3a_relu_1/Relu
[INFO] [GpuLayer] block_3a_conv_shortcut/convolution
[INFO] [GpuLayer] block_3a_conv_2/convolution + add_5/add + block_3a_relu/Relu
[INFO] [GpuLayer] block_3b_conv_1/convolution + block_3b_relu_1/Relu
[INFO] [GpuLayer] block_3b_conv_2/convolution + add_6/add + block_3b_relu/Relu
[INFO] [GpuLayer] block_4a_conv_1/convolution + block_4a_relu_1/Relu
[INFO] [GpuLayer] block_4a_conv_shortcut/convolution
[INFO] [GpuLayer] block_4a_conv_2/convolution + add_7/add + block_4a_relu/Relu
[INFO] [GpuLayer] block_4b_conv_1/convolution + block_4b_relu_1/Relu
[INFO] [GpuLayer] block_4b_conv_2/convolution + add_8/add + block_4b_relu/Relu
[INFO] [GpuLayer] output_bbox/convolution
[INFO] [GpuLayer] output_cov/convolution
[INFO] [GpuLayer] PWN(output_cov/Sigmoid)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +159, now: CPU 403, GPU 2449 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +241, GPU +239, now: CPU 644, GPU 2688 (MiB)
[WARNING] Detected invalid timing cache, setup a local cache instead
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 34048
[INFO] Total Device Persistent Memory: 10447360
[INFO] Total Scratch Memory: 0
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 11 MiB, GPU 1032 MiB
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 896, GPU 2705 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 896, GPU 2705 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 896, GPU 2705 (MiB)
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 895, GPU 2705 (MiB)
[INFO] [MemUsageSnapshot] Builder end: CPU 887 MiB, GPU 2705 MiB
However, when I set the optimization profile I want in the Inference with
self.context.active_optimization_profile = 1
I get:
[TensorRT] ERROR: 3: [executionContext.cpp::setOptimizationProfileInternal::753] Error Code 3: Internal Error (Parameter check failed at: runtime/api/executionContext.cpp::setOptimizationProfileInternal::753, condition: profileIndex >= 0 && profileIndex < mEngine.getNbOptimizationProfiles()
It seems that no optimization profiles were found in the TensorRT engine. What can I do?
Thank you for your attention.