Description
Hello!
By default, trtexec uses random values for engine inputs during inference and profiling. In my case, certain layers of my model exhibit diverse memory access patterns that are highly dependent on the range of the model input values. As a result, the range and distribution of the inputs significantly impact the performance measurements.
When converting my ONNX model to a TensorRT engine, I observed that the input values during inference seem to be constrained to a range between 0 and 1. I confirmed this by printing values from inside a custom plugin kernel within my model. However, when I subsequently load the converted engine using the --loadEngine
flag directly, the random input values no longer remain in the 0 to 1 range. Instead, the inputs are still within a limited (and small!) range, they do not appear to be truly random across the full FP32/FP16 range.
I have executed both the model conversion and loadEngine
commands multiple times, and I’ve noticed a consistent pattern. During the inference after ONNX to TensorRT engine conversion, the input values always fall within the 0 to 1 range. Similarly, whenever I call loadEngine
for inference, the input values remain fixed within the same exact range.
I have manually assigned realistic input values for profiling my model. However, out of curiosity, I’m wondering if you could help me get a better understanding on the behavior of trtexec with regard to the range and distribution of its random inputs. Do the random input values used by trtexec have a defined range and/or distribution?
I reviewed the available documentation but was unable to find detailed information on this topic. It would be helpful for future profiling if users could define input ranges and distributions to better reflect real-world usage patterns.
Thank you for your assistance in shedding light on this!
Environment
TensorRT Version: 8.5
GPU Type: 3090
Nvidia Driver Version: 525
CUDA Version: 11.6
CUDNN Version: 8.6
Operating System + Version: linux