HI,i want to ｍeasure the time on the tensorcore/gpu and DLA on xavier,so i use trtexec

1.i test tensorcore/gpu and save engine

```
./trtexec --deploy=vgg16.prototxt --output=pool5 --batch=10 --int8 --saveEngine=vgg16_int8_gpu
[I] deploy: vgg16.prototxt
[I] output: pool5
[I] batch: 10
[I] int8
[I] saveEngine: vgg16_int8_gpu
[I] Input "data": 3x224x224
[I] Output "pool5": 512x7x7
[I] Engine has been successfully saved to vgg16_int8_gpu
[I] Average over 10 runs is 20.0581 ms (host walltime is 20.1438 ms, 99% percentile time is 21.1941).
[I] Average over 10 runs is 19.8605 ms (host walltime is 19.9275 ms, 99% percentile time is 19.8838).
[I] Average over 10 runs is 19.993 ms (host walltime is 20.0624 ms, 99% percentile time is 20.9207).
[I] Average over 10 runs is 19.8484 ms (host walltime is 19.9225 ms, 99% percentile time is 19.9121).
[I] Average over 10 runs is 19.8426 ms (host walltime is 19.9174 ms, 99% percentile time is 19.9076).
[I] Average over 10 runs is 19.8475 ms (host walltime is 19.9118 ms, 99% percentile time is 19.9358).
[I] Average over 10 runs is 20.0039 ms (host walltime is 20.0671 ms, 99% percentile time is 20.9622).
[I] Average over 10 runs is 19.854 ms (host walltime is 19.9188 ms, 99% percentile time is 19.8874).
[I] Average over 10 runs is 20.0111 ms (host walltime is 20.0859 ms, 99% percentile time is 20.9599).
[I] Average over 10 runs is 19.8355 ms (host walltime is 19.9088 ms, 99% percentile time is 19.8575).
```

- i test DLA and load engine

```
./trtexec --output=pool5 --batch=10 --int8 --useDLACore=1 --loadEngine=vgg16_int8_gpu
[I] output: pool5
[I] batch: 10
[I] int8
[I] useDLACore: 1
[I] loadEngine:vgg16_int8_gpu
[I] vgg16_int8_gpu has been successfully loaded.
[I] Average over 10 runs is 19.8767 ms (host walltime is 19.9826 ms, 99% percentile time is 19.9714).
[I] Average over 10 runs is 19.8654 ms (host walltime is 19.9363 ms, 99% percentile time is 19.8927).
[I] Average over 10 runs is 20.0027 ms (host walltime is 20.0833 ms, 99% percentile time is 20.9712).
[I] Average over 10 runs is 19.8444 ms (host walltime is 19.9144 ms, 99% percentile time is 19.8847).
[I] Average over 10 runs is 20.0286 ms (host walltime is 20.0979 ms, 99% percentile time is 20.9602).
[I] Average over 10 runs is 19.8443 ms (host walltime is 19.9111 ms, 99% percentile time is 19.8697).
[I] Average over 10 runs is 19.8551 ms (host walltime is 19.9237 ms, 99% percentile time is 19.8959).
[I] Average over 10 runs is 19.8492 ms (host walltime is 19.9131 ms, 99% percentile time is 19.8932).
[I] Average over 10 runs is 20.0132 ms (host walltime is 20.0969 ms, 99% percentile time is 21.0791).
[I] Average over 10 runs is 19.8568 ms (host walltime is 19.9351 ms, 99% percentile time is 19.9032).
```

Q:Whether the serialized model(vgg16_int8_gpu) can only be on the GPU, even if I specify to use DLA?

thanks!