Detailed CUDA Implementation of TensorFlow

Ziqi · November 19, 2021, 7:19pm

We are currently investigating how to deploy TensorFlow 2 for custom OP/deep learning on our product. We currently understand that a session is the place to execute a TensorFlow graph, which may include both deep learning OPs or self-defined (custom) OPs. To find out the best software architecture for our product, we would like to know how a tensorflow session is implemented in CUDA. Specifically, we would like to understand stream management, memory copy and compute execution in a session. In more depth, we would like to understand the implementation at OS level, e.g., running different sessions in a single process and in multiple processes. Is there a particular document for our questions either from tf2 or from Nvidia?

Robert_Crovella · November 19, 2021, 7:33pm

NVIDIA doesn’t directly maintain or support Tensorflow. You might want to ask your questions about it on the TF forums