Tried that also :
sudo ./trtexec --deploy=FCN-Alexnet-Pascal-VOC/fcn_alexnet.deploy.prototxt --output="score_fr_21classes" --fp16 --useDLA=1 --allowGPUFallback
[sudo] password for nvidia:
deploy: FCN-Alexnet-Pascal-VOC/fcn_alexnet.deploy.prototxt
output: score_fr_21classes
fp16
useDLA: 1
allowGPUFallback
Input "data": 3x500x356
Output "score_fr_21classes": 21x16x12
Default DLA is enabled but layer conv1 is not running on DLA, falling back to GPU.
../builder/cudnnBuilder2.cpp (689) - Misc Error in buildSingleLayer: 1 (Unable to process layer.)
../builder/cudnnBuilder2.cpp (689) - Misc Error in buildSingleLayer: 1 (Unable to process layer.)
could not build engine
Engine could not be created
Engine could not be created
it works with googlenet :
sudo ./trtexec --deploy=deploy_gn.prototxt --output="prob" --fp16 --useDLA=1 --allowGPUFallback
deploy: deploy_gn.prototxt
output: prob
fp16
useDLA: 1
allowGPUFallback
Input "data": 3x224x224
Output "prob": 1000x1x1
Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 7.61525 ms (host walltime is 7.71438 ms, 99% percentile time is 8.42531).
Average over 10 runs is 7.45112 ms (host walltime is 7.5426 ms, 99% percentile time is 7.54054).
Average over 10 runs is 7.47984 ms (host walltime is 7.56824 ms, 99% percentile time is 7.56355).
Average over 10 runs is 7.46052 ms (host walltime is 7.54673 ms, 99% percentile time is 7.6247).
Average over 10 runs is 7.43091 ms (host walltime is 7.5198 ms, 99% percentile time is 7.4839).
Average over 10 runs is 7.42442 ms (host walltime is 7.51952 ms, 99% percentile time is 7.56941).
Average over 10 runs is 7.41408 ms (host walltime is 7.50007 ms, 99% percentile time is 7.4983).
Average over 10 runs is 7.42892 ms (host walltime is 7.52415 ms, 99% percentile time is 7.60461).
Average over 10 runs is 7.41212 ms (host walltime is 7.49461 ms, 99% percentile time is 7.44051).
Average over 10 runs is 7.46012 ms (host walltime is 7.54797 ms, 99% percentile time is 7.55254).
but it’s slower with the DLA (3 times slower) :
$ sudo ./trtexec --deploy=deploy_gn.prototxt --output="prob" --fp16 --useDLA=1 --allowGPUFallback
deploy: deploy_gn.prototxt
output: prob
fp16
useDLA: 1
allowGPUFallback
Input "data": 3x224x224
Output "prob": 1000x1x1
Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 7.61525 ms (host walltime is 7.71438 ms, 99% percentile time is 8.42531).
Average over 10 runs is 7.45112 ms (host walltime is 7.5426 ms, 99% percentile time is 7.54054).
Average over 10 runs is 7.47984 ms (host walltime is 7.56824 ms, 99% percentile time is 7.56355).
Average over 10 runs is 7.46052 ms (host walltime is 7.54673 ms, 99% percentile time is 7.6247).
Average over 10 runs is 7.43091 ms (host walltime is 7.5198 ms, 99% percentile time is 7.4839).
Average over 10 runs is 7.42442 ms (host walltime is 7.51952 ms, 99% percentile time is 7.56941).
Average over 10 runs is 7.41408 ms (host walltime is 7.50007 ms, 99% percentile time is 7.4983).
Average over 10 runs is 7.42892 ms (host walltime is 7.52415 ms, 99% percentile time is 7.60461).
Average over 10 runs is 7.41212 ms (host walltime is 7.49461 ms, 99% percentile time is 7.44051).
Average over 10 runs is 7.46012 ms (host walltime is 7.54797 ms, 99% percentile time is 7.55254).
$ sudo ./trtexec --deploy=deploy_gn.prototxt --output="prob" --fp16 --useDLA=2 --allowGPUFallback
deploy: deploy_gn.prototxt
output: prob
fp16
useDLA: 2
allowGPUFallback
Input "data": 3x224x224
Output "prob": 1000x1x1
Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 8.35648 ms (host walltime is 8.51915 ms, 99% percentile time is 13.578).
Average over 10 runs is 7.60057 ms (host walltime is 7.71397 ms, 99% percentile time is 7.88573).
Average over 10 runs is 7.45977 ms (host walltime is 7.54657 ms, 99% percentile time is 7.49731).
Average over 10 runs is 7.46231 ms (host walltime is 7.54995 ms, 99% percentile time is 7.6144).
Average over 10 runs is 7.46377 ms (host walltime is 7.55141 ms, 99% percentile time is 7.57261).
Average over 10 runs is 7.45527 ms (host walltime is 7.5493 ms, 99% percentile time is 7.51965).
Average over 10 runs is 7.43578 ms (host walltime is 7.52406 ms, 99% percentile time is 7.4935).
Average over 10 runs is 7.43039 ms (host walltime is 7.51354 ms, 99% percentile time is 7.47037).
Average over 10 runs is 7.42859 ms (host walltime is 7.51587 ms, 99% percentile time is 7.57533).
Average over 10 runs is 7.43494 ms (host walltime is 7.52745 ms, 99% percentile time is 7.5119).
$ sudo ./trtexec --deploy=deploy_gn.prototxt --output="prob" --fp16 --allowGPUFallback
deploy: deploy_gn.prototxt
output: prob
fp16
allowGPUFallback
Input "data": 3x224x224
Output "prob": 1000x1x1
name=data, bindingIndex=0, buffers.size()=2
name=prob, bindingIndex=1, buffers.size()=2
Average over 10 runs is 2.33557 ms (host walltime is 2.42281 ms, 99% percentile time is 2.53942).
Average over 10 runs is 2.30036 ms (host walltime is 2.38397 ms, 99% percentile time is 2.30755).
Average over 10 runs is 2.30158 ms (host walltime is 2.38179 ms, 99% percentile time is 2.30685).
Average over 10 runs is 2.30062 ms (host walltime is 2.37806 ms, 99% percentile time is 2.30912).
Average over 10 runs is 2.30475 ms (host walltime is 2.38379 ms, 99% percentile time is 2.3351).
Average over 10 runs is 2.30268 ms (host walltime is 2.38351 ms, 99% percentile time is 2.30605).
Average over 10 runs is 2.29729 ms (host walltime is 2.37373 ms, 99% percentile time is 2.30205).
Average over 10 runs is 2.30181 ms (host walltime is 2.38668 ms, 99% percentile time is 2.31046).
Average over 10 runs is 2.30885 ms (host walltime is 2.4033 ms, 99% percentile time is 2.36349).
Average over 10 runs is 2.30052 ms (host walltime is 2.39568 ms, 99% percentile time is 2.30486).