Is there a way to run Convolution API on pitched memory to improve performance ?
API does not mention something like this, but in theory it should improve performance a lot …
Is it possible without writing custom convolution ??
I also created similar question on StackOverflow cuda - cuDNN Convolution on pitched memory - Stack Overflow
Pitched memory may not help cuDNN’s convolutions run faster. Pitched memory is really just memory allocated with extra padding so that rows always start aligned. While this may improve performance in particular use cases, not sure on convolutions are likely to see performance gains here.
Users will be able to use pitched memory when running the cuDNN API. cuDNN’s API allows users to provide tensor strides that indicate the distance between any two elements for a given dimension. When users have pitched memory, they would just need to provide that pitch as the stride for that dimension.