Hello. I understand that all CUDA capable devices can run the optimised graph, but do I require specific GPUs to optimise the graph first?

I have tried optimising a graph to precision mode ‘FP16’ on a machine with an Nvidia Geforce GTX 1050M. The graph storage size on the system is the same as before running the optimisation step, and the performance is the same as well.

A generated TensorRT PLAN is valid for a specific GPU — more precisely, a specific CUDA Compute Capability. For example, if you generate a PLAN for an NVIDIA P4 (compute capability 6.1) you can’t use that PLAN on an NVIDIA Tesla V100 (compute capability 7.0).

In this case, P100 has compute cap 6.0, and Geforce 1050 has compute cap 6.1