Is it possible to share work load between GPU and CPU?


Is it possible to divide workload between GPU and Host CPU using TensorRT?

I have my GPU installed on a relatively powerful system which is idle for most part during GPU execution. Can I make use of the CPU power by dividing some steps to take place on the CPU and some on the GPU? like some sort of pre-conversion of the input data that might speed up things?

My current understanding is that no matter what precision or input format I compile my trt for, the input given to tensorRT is directly given to GPU without using any CPU power. Is this understanding correct?


TensorRT Version: 8003
Nvidia Driver Version: 450.51.05
CUDA Version: 11.0
CUDNN Version:
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):


Currently, TensorRT doesn’t support CPUs.

Thank you.