Hello,
I am looking for a wizard of the C backend of TF and specially, the gestion of the GPU. I have already post on Stack Overflow, but the question is maybe a bit too much. I copy past from SoF TensorFlow - Stack Overflow:
I have written a full TF inference pipeline using C-backend. Currently, I am working on a hardware where I have multiple GPU (x8). It works well on CPU, and not really on GPU because I am not able to select correctly the devices.
The workflow is the following: a single thread has setup the session from a saved model
TF_LoadSessionFromSavedModel(...)
Then, a thread from a pool executes the usual workflow for the C-backend (setup input/ouput and run)
TF_NewTensor(...) // allocate input
TF_AllocateTensor(.....) // allocate ouput
TF_SessionRun(....)
Currently, I know on which device I want to execute my code so I am using the CUDA Driver API cudaSetDevice
however, it does not have any influence ( by default it is always on device 0, check with nvidia-smi). If I force the device using CUDA_VISIBLE_DEVICES
I can effectively select an other device ID, however CUDA_VISIBLE_DEVICE=0,1,2,3,4,5,6,7
combined with cudaSetDevice
does not work.
I am suspecting TF to force the device internally, maybe flexibility could be done using TF_SetConfig
, or run_options
of TF_SessionRun
. However, the documentation does not exist for the C backend. So if a TF wizard is here, I will appreciate advice to set up correctly the device to execute the TF_SessionRun
.