Lets say for example that my tensor is 4x3 matrix like this:
Before copying the tensor to the gpu I need to flatten it, whats the way to do it?
I don’t get why it needs to be flattened. If the 4x3 elements are contiguous in memory, the memory copy should be as simple as copying 12*sizeof(element) bytes.
Sequence of elements will depend on how you organized your data.
In TRT, if an input tensor is marked as NCHW FP32, in that case it will be option 2.