Hi,
Sorry for the late update.
Do you use TensorRT or cuDNN related API? A common cause is that it takes some memories to load the libraries, which won’t be shown on the profiling tool.
Here is a related topic for checking the memory usage from the libraries:
Thanks.