I have a basic question about the interaction between TensorRT and unified memory. I would like to oversubscribe GPU memory while running inference. Could someone please explain or point me to docs (that I’m not finding) for the following?
- Does TensorRT support use of unified memory?
- How do I “tell” TensorRT that I would like to use unified memory - do I just pass in buffers that come from cudaMallocManaged or is there some other setting?