The Developer Guide mentions that there may be different types of memory space (CPU/GPU) for bindings:
When invoking inference, you must set up the input and output buffers in the appropriate locations. Depending on the nature of the data, this may be in either CPU or GPU memory. If not obvious based on your model, you can query the engine to determine in which memory space to provide the buffer.
But it is unclear how to do this. I couldn’t find any API calls for this in
ICudaEngine nor in
IExecutionContext. Can anybody point me to the right place in docs?
TensorRT Version: 18.104.22.168