cudnnGetConvolutionForwardAlgorithm observation and suggested change.

I had a observation with cudnnGetConvolutionForwardAlgorithm and suggest a change to it.

cudnnStatus_t cudnnGetConvolutionForwardAlgorithm(
    cudnnHandle_t                      handle,               //input
    const cudnnTensorDescriptor_t      xDesc,                //input
    const cudnnFilterDescriptor_t      wDesc,                //input
    const cudnnConvolutionDescriptor_t convDesc,             //input
    const cudnnTensorDescriptor_t      yDesc,                //input
    cudnnConvolutionFwdPreference_t    preference,           //input  (why do we need this?)
    size_t                             memoryLimitInBytes,   //input
    cudnnConvolutionFwdAlgo_t         *algo)                 //output

I propose a different something more like this.

cudnnStatus_t cudnnGetConvolutionForwardAlgorithm(
    cudnnHandle_t                      handle,               //input
    const cudnnTensorDescriptor_t      xDesc,                //input
    const cudnnFilterDescriptor_t      wDesc,                //input
    const cudnnConvolutionDescriptor_t convDesc,             //input
    const cudnnTensorDescriptor_t      yDesc,                //input 
    size_t                            *memoryinbytes,        //input/output
    cudnnConvolutionFwdAlgo_t         *algo)                 //output

It is pretty much the same as before, but if you initialize memoryinbytes with some value like zero. Then it will know that you prefer no workspace. If it is higher than zero then it will find the best one considering the amount you initialized it. Then it could return change the value to how much is needed. If you want it to be the fastest with no memory constraints just make it NULL, and it will return the amount of mem needed.