C++ call python neural network model, the model was loaded on GPU, but can't run on the GPU, the CPU run the model

The python has constructed the VGG16 model using TensorFlow and Keras. The python code can easily load and run on the GPU.
When I using C++ to call the .py function, the model has load onto the GPU, but when running the predicting phase, the GPU was not running, the model has using the CPU to run the predicting phase.
Do I need to explicitly load the Keras model onto the GPU? The py code can run on the GPU, the C++ code can only load onto the model. So, what’s the problem?


I have dealt with this problem. I watch the Resource Usage Platform, I found that the GPU utilities have a pulse, which means the GPU run the model’s forward predict task once. All the waiting time was wasted on the image pre-processing, include convert color space, face alignment, denseflow method, optical flow processing, etc. So, although the model is loaded on GPU, the running phase is waiting for the processed image data. Which caused the fake situation of GPU memory is full, but the utility is empty.