Unified Memory support in TensorRT

Description

I have a basic question about the interaction between TensorRT and unified memory. I would like to oversubscribe GPU memory while running inference. Could someone please explain or point me to docs (that I’m not finding) for the following?

  • Does TensorRT support use of unified memory?
  • How do I “tell” TensorRT that I would like to use unified memory - do I just pass in buffers that come from cudaMallocManaged or is there some other setting?
1 Like

Hi @anand.raja
Your query has been noted.
Please allow me some time to check on this.
Thank you!

Hi @anand.raja,

You can use unified memory as input/output buffer .
However, i am afraid, we do not have any documents published around that.
Thanks!

@AakankshaS, thanks for checking up and responding.

If I’m not mistaken, TensorRT makes additional allocations while the engine is running, beyond the buffers you provide. Is there a way to use unified memory for these buffers?

Thanks!

Hi @anand.raja,

Sorry,but end user can’t manage this memory as this is managed by TRT itself.
Thanks!

1 Like

@AakankshaS I believe your response is incorrect. The setDeviceMemory API provides a way to specify an application-managed buffer. On my tests, this allows use of unified memory.

1 Like

I am trying to use unified memory, can you share your code?

I am also worried about this problem, may I ask you to solve it

I am also worried about this problem, may I ask if you have solved it? If you have solved it, can you show me your code