I have tried to access a vx_image object created by vxCreateImage using vxMapImagePatch, and found that the device memory allocated for it is not continuous.
For example, in main function, I create
vx_image image = vxCreateImage( context, width, height, VX_DF_IMAGE_U8 );
To access it in an OpenVX node, I do the map
vxGetValidRegionImage( image, &rect );
vxMapImagePatch( image, &rect, 0, &id, &addr, (void**)&ptr, VX_READ_ONLY, NVX_MEMORY_TYPE_CUDA, VX_NOGAP_X );
The problem is the parameter addr.stride_y (The physical byte distance between the two first elements of two consecutive rows) is greater than sizeof( vx_uint8 ) * width, which means a row in the image is not stored immediately following its precedent one on physical memory.
Why does this happen? And how could I create a continuous vx_image object, please?
Thank in advance
To have better performance, we usually use cudaMallocPitch rather than cudaMalloc.
The cudaMallocPitch may pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row.
Sorry I do not understand your answer very much, could you explain more clearly?
From your answer, I could understand that vx_image object’s data is allocated with cudaMallocPitch, right?
If i’m correct, maybe it is the reason for the discontinuity of the memory returned from vxMapImagePatch, which causes the wrong indexing of array element in some cuda functions I have used (eg. cuFFT).
So, could you please confirm my prediction and give me an example of cudaMallocPitch having better performance than cudaMalloc?
Thanks in advance,
Image buffer of VisionWorks is allocated with cudaMallocPitch.
To increase occupancy, the function apply automatically padding to make sure buffer is well-aligned by 256.
There is no official benchmarking report for these two allocation function.
But here is relevant experiment from user for your reference: