I am looking for a wizard of the C backend of TF and specially, the gestion of the GPU. I have already post on Stack Overflow, but the question is maybe a bit too much. I copy past from SoF TensorFlow - Stack Overflow:
I have written a full TF inference pipeline using C-backend. Currently, I am working on a hardware where I have multiple GPU (x8). It works well on CPU, and not really on GPU because I am not able to select correctly the devices.
The workflow is the following: a single thread has setup the session from a saved model
Then, a thread from a pool executes the usual workflow for the C-backend (setup input/ouput and run)
TF_NewTensor(...) // allocate input TF_AllocateTensor(.....) // allocate ouput TF_SessionRun(....)
Currently, I know on which device I want to execute my code so I am using the CUDA Driver API
cudaSetDevice however, it does not have any influence ( by default it is always on device 0, check with nvidia-smi). If I force the device using
CUDA_VISIBLE_DEVICES I can effectively select an other device ID, however
CUDA_VISIBLE_DEVICE=0,1,2,3,4,5,6,7 combined with
cudaSetDevice does not work.
I am suspecting TF to force the device internally, maybe flexibility could be done using
TF_SetConfig , or
TF_SessionRun . However, the documentation does not exist for the C backend. So if a TF wizard is here, I will appreciate advice to set up correctly the device to execute the