I am trying to use my specific test image (for example: checkerboard.png) as inference input to trtexec, I followed with instruction here: About --loadInputs in trtexec - #2 by NVES
convert my input file (checkerboard.png) to be binary data (checkerboard.dat), the size is 1X3X1500X1500=67500000 Bytes
then if I select the GPU to run my trtexec: trtexec …–loadInputs=input:0:checkerboard.dat…, it runs without problem
but if I select the GPU+DLA to run my trtexec: trtexec --useDLACore=0 --loadInputs=input:0:checkerboard.dat… , it fails with message to open my checkerboard.dat “Note: Expected: 72000000 bytes but only read: 67500000 bytes”
How can the same input data (checkerboard.dat) works for running on GPU, but fails for running on GPU+DLA?
We need to reproduce this issue to know more about the error comes from.
Could you share a reproducible sample with us so we can check it with our internal team?
follow instructions in GitHub - NVIDIA-AI-IOT/jetson_benchmarks: Jetson Benchmark to profile this model on AGX orin, using options: --precision int8 --loadInputs=input_18:0:checkboard.dat (you will have to make minor change to the file under jetson_benchmarks/utils/load_store_engine.py a little bit, basically, appending the engine_CMD with some string which can take the “–loadInputs” option.
After that, you should be able to run without problem with GPU in “benchmark_csv/orin-benchmarks.csv”
demo, onnx,1,1,0,2048,0,NA,NA
but you will hit with the problem if GPU+2DLA is used:
demo, onnx,3,1,0,2048,0,NA,NA
I checked more on this today and here is what I found:
I tested with trtexec binary directly, my simple model is onnx with float32 input data type, for example, if I want to pass in a test image (3, 200x200) to trtexec by option (–loadInputs), I have to convert my test image (3, 200X200) into some binary data first, either it ends up in 120000 Bytes (int8) or 480000 (float32)
if I choose to run the model on GPU only (with precision int8), it says it expect input in 120000 (1 x 3 x 200 x 200) bytes. No issue here.
If I choose to run the model on GPU+DLA, the error log complains it expect the input data to be 1280000 bytes, so it seems 32 x 200 x 200 is used
I tested other images with various resolutions, the issue is the same, so the question here is how this major “32” number is provided and why it’s different expected test image size for GPU vs GPU+DLA?