I have 2 engine models and I want to compare inference time of the 2 engine models from profiling. But I recognize that results of profiling runs are different with large fluctuation. So I want to allocate specific resource for each run. Is there any way to do it? If no, so it is difficult to compare 2 engine models by using profiling.
Hi,
Sorry for the delayed response.
You can customize “CUDA_VISIBLE_DEVICES” for GPUs.
Example: You can make single GPU visible to the TensorRT NGC container and try profiling multiple times.
Thank you.
@spolisetty
Thanks.
You can customize “CUDA_VISIBLE_DEVICES” for GPUs.
Yes, I know this option. I mean that is there any way to allocate RAM, no. CPU core when profiling.