Hi,
Thanks for your question.
Zero-copy function needs to be specified when calling cudaMalloc, so modification is needed.
If you want to make your tensorflow support zero copy, you can follow this page:
http://arrayfire.com/zero-copy-on-tegra-k1/
More, I think if your framework support GPU input and then it’s possible to use zero copy.
For example:
- Prepared shared pointer and create model that uses this pointer as input
- Load image data to the shared pointer
- Inference from GPU input layer directly