My problem with loadInputs in trtexec

I am trying to use my specific test image (for example: checkerboard.png) as inference input to trtexec, I followed with instruction here: About --loadInputs in trtexec - #2 by NVES

  1. convert my input file (checkerboard.png) to be binary data (checkerboard.dat), the size is 1X3X1500X1500=67500000 Bytes
  2. then if I select the GPU to run my trtexec: trtexec …–loadInputs=input:0:checkerboard.dat…, it runs without problem
  3. but if I select the GPU+DLA to run my trtexec: trtexec --useDLACore=0 --loadInputs=input:0:checkerboard.dat… , it fails with message to open my checkerboard.dat “Note: Expected: 72000000 bytes but only read: 67500000 bytes”

How can the same input data (checkerboard.dat) works for running on GPU, but fails for running on GPU+DLA?

Hi,

We need to reproduce this issue to know more about the error comes from.
Could you share a reproducible sample with us so we can check it with our internal team?

Thanks.

sure, here is how you can reproduce:

  1. take the simple model (demo-bs1.onnx)
    demo-bs1.onnx (1.5 KB)

  2. take the test image (checkerboard.png) and it’s binary file (checkerboard.dat)
    checkerboard.dat (64.4 MB)

  3. follow instructions in GitHub - NVIDIA-AI-IOT/jetson_benchmarks: Jetson Benchmark to profile this model on AGX orin, using options: --precision int8 --loadInputs=input_18:0:checkboard.dat (you will have to make minor change to the file under jetson_benchmarks/utils/load_store_engine.py a little bit, basically, appending the engine_CMD with some string which can take the “–loadInputs” option.

After that, you should be able to run without problem with GPU in “benchmark_csv/orin-benchmarks.csv”
demo, onnx,1,1,0,2048,0,NA,NA

but you will hit with the problem if GPU+2DLA is used:
demo, onnx,3,1,0,2048,0,NA,NA

Hi,

Have you tried to use the trtexec binary directly?
Could you try to feed the same dat file as input to trtexec to see if it works?

Thanks.

Yes, I have tried with trtexec binaary directly, exactly the same problem.

I checked more on this today and here is what I found:

  1. I tested with trtexec binary directly, my simple model is onnx with float32 input data type, for example, if I want to pass in a test image (3, 200x200) to trtexec by option (–loadInputs), I have to convert my test image (3, 200X200) into some binary data first, either it ends up in 120000 Bytes (int8) or 480000 (float32)
  2. if I choose to run the model on GPU only (with precision int8), it says it expect input in 120000 (1 x 3 x 200 x 200) bytes. No issue here.
  3. If I choose to run the model on GPU+DLA, the error log complains it expect the input data to be 1280000 bytes, so it seems 32 x 200 x 200 is used

I tested other images with various resolutions, the issue is the same, so the question here is how this major “32” number is provided and why it’s different expected test image size for GPU vs GPU+DLA?

Hi,

There are some constraints based on the data type you configured.
For example, C must be padded to the next 32-byte for kCHW16 and kCHW32 formats.

More details, please check below doc:

Thanks.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.