I want to improve the memory access efficiency when training a network. Is there any available API to support the operator fusion to reduce the data transfer?
Hi,
There is limited Fused Ops support in 7.6.x.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/sdk/cudnn-archived/cudnn_765/cudnn-api/index.html#cudnnFusedOps_t
Thanks
Hi, thanks for your answer. I have figured it out.
One more question, do you have code example for nonlinear network, e.g. resnet, googlenet, written in c++ or cuda from scratch? I am looking into how to implement nonlinear blocks. Thanks