Zero copy for Tensorflow

Hi, I am currently working on machine learning with tensorflow.

It seems that tensorflow allocates separate memory space for cpu and gpu, and copy data from cpu side to gpu side. Since TX1 has one unified memory space and it is shared by CPU and GPU, tensorflow requests 2x more memory and the copy from CPU memory to GPU memory is useless in TX1 (can be eliminated by zero copy).

Tensorflow does not support zero copy. Is there any way to force all memory copy between cpu and gpu to be zero copy? Or any other framework support zero copy (e.g., caffe, theano, …)?


Thanks for your question.

Zero-copy function needs to be specified when calling cudaMalloc, so modification is needed.
If you want to make your tensorflow support zero copy, you can follow this page:

More, I think if your framework support GPU input and then it’s possible to use zero copy.
For example:

  1. Prepared shared pointer and create model that uses this pointer as input
  2. Load image data to the shared pointer
  3. Inference from GPU input layer directly