Unified Memory support in TensorRT

Description

I have a basic question about the interaction between TensorRT and unified memory. I would like to oversubscribe GPU memory while running inference. Could someone please explain or point me to docs (that I’m not finding) for the following?

  • Does TensorRT support use of unified memory?
  • How do I “tell” TensorRT that I would like to use unified memory - do I just pass in buffers that come from cudaMallocManaged or is there some other setting?

Hi @anand.raja
Your query has been noted.
Please allow me some time to check on this.
Thank you!

Hi @anand.raja,

You can use unified memory as input/output buffer .
However, i am afraid, we do not have any documents published around that.
Thanks!

@AakankshaS, thanks for checking up and responding.

If I’m not mistaken, TensorRT makes additional allocations while the engine is running, beyond the buffers you provide. Is there a way to use unified memory for these buffers?

Thanks!

Hi @anand.raja,

Sorry,but end user can’t manage this memory as this is managed by TRT itself.
Thanks!

@AakankshaS I believe your response is incorrect. The setDeviceMemory API provides a way to specify an application-managed buffer. On my tests, this allows use of unified memory.

1 Like