Hi, apparently NCHW is the preferred layout for data buffers in cuDNN. However, the framework I am using (includes CPU optimized routines) has all its data buffers stored in a NHWC manner. I do not want to loose CPU optimized code for scenarios where my users have no suitable GPU available.
Are there any significant performance penalties for NHWC that would make it worthwhile to convert to NCHW?
Are there performance differences between using 4d and Nd tensor descriptors?
Same question for 2d and Nd convolution descriptor?
Did somebody benchmark this already?