I want to replace the ‘cudaMallocPitch’ function with an custom routine (tailored for images), which employ internally a caching device allocator (like the one from Cub library, cub::CachingDeviceAllocator). This is, because the cuda memory management functions are not the fastest …
In order to do that, I have to know how ‘cudaMallocPitch’ internally calculates the base adress and the ‘pitch’, in bytes, between two consectuive image rows. Unfortunately, there is no official information on this, and no property of the ‘cudaDeviceProp’ object which i could query. So how to find out?
(1) ad base adress alignment:
Base adress must be a multiple of ‘cudaDeviceProp::textureAlignment’, otherwise on cannot bind a texture to it. Any other requirements ?
(2) ad pitch alignment:
I know that the pitch must be a multiple of ‘cudaDeviceProp::texturePitchAlignment’, otherwise one cannot bind a texture (or texture object) to it. According to http://stackoverflow.com/questions/12550927/pitch-alignment-for-2d-textures , the alignment seems to be 512 bytes currently. And (of course), it must be a muliple of the size (in bytes) of one pixel. Any other requirements ?
(3) ad implications
What if I choose the alignment of base adress to be the same as ‘cudaDeviceProp::textureAlignment’ and the alignment of pitch to be the same as ‘cudaDeviceProp::texturePitchAlignment’ , will I get into troubles (crashes and/or major performance degradations of kernels working with such images) ?