Tensorflow model load time is extremely long when GPU computation is enabled

It takes tens of minutes loading Tensorflow protobuf model (C/C++ code base) on Jeston Nano. It takes less than a second when CPU is used.

Tensorflow version: 2.4.0
OS: Linux arm-jetson 4.9.10


Could you monitor the device with tegrastats to see if any swap memory is used?
Please noted that swap is not a real physical memory, and the IO interface will be slow.