TensorFlow C back-end, multi-GPU selection


I am looking for a wizard of the C backend of TF and specially, the gestion of the GPU. I have already post on Stack Overflow, but the question is maybe a bit too much. I copy past from SoF TensorFlow - Stack Overflow:

I have written a full TF inference pipeline using C-backend. Currently, I am working on a hardware where I have multiple GPU (x8). It works well on CPU, and not really on GPU because I am not able to select correctly the devices.

The workflow is the following: a single thread has setup the session from a saved model


Then, a thread from a pool executes the usual workflow for the C-backend (setup input/ouput and run)

 TF_NewTensor(...) // allocate input 
 TF_AllocateTensor(.....) // allocate ouput 

Currently, I know on which device I want to execute my code so I am using the CUDA Driver API cudaSetDevice however, it does not have any influence ( by default it is always on device 0, check with nvidia-smi). If I force the device using CUDA_VISIBLE_DEVICES I can effectively select an other device ID, however CUDA_VISIBLE_DEVICE=0,1,2,3,4,5,6,7 combined with cudaSetDevice does not work.

I am suspecting TF to force the device internally, maybe flexibility could be done using TF_SetConfig , or run_options of TF_SessionRun . However, the documentation does not exist for the C backend. So if a TF wizard is here, I will appreciate advice to set up correctly the device to execute the TF_SessionRun .