Problem about Memory structure

darot · March 26, 2009, 5:38am

What difference between memory allocation of cudaMallocArray and cudaMallocPitch?
will the data structure be different or any thing different?

Quoc_Vinh · March 26, 2009, 7:51am

cudaMallocPitch is used for global memory, this is a normally memory structure,

you can READ and WRITE to this memory.

pitch is a size for optimize coalescing access to global memory.

for example if you have a matrix BYTE matrix[254][100];

so in this case width = 254 elements and gets 254 * sizeof(BYTE) = 245 bytes.

If used cudaMallocPich, pitch = 256 bytes, so it warranty for coalescing access pattern.

cudaMallocArray is not a normally structure, this structure is Z-curve.

http://en.wikipedia.org/wiki/Z-order_(curve)

It builds for optimize using texture. You can not directly access data on cudaArray.

The only way is using tex1D, tex2D, tex3D function to READ ONLY data from texture.

“this texture bound to cudaArray”

of cause you can bind texture to global memory has declared by using cudaMallocPitch, and

READ ONLY data from texture by using tex1Dfetch().

darot · March 26, 2009, 8:03am

Thank you for your detail explain.

And is there any performance difference of accesing speed between bind cudaMallocArray Mem or cudaMallocPitch Mem?

cudaMallocPitch is used for global memory, this is a normally memory structure,

you can READ and WRITE to this memory.

pitch is a size for optimize coalescing access to global memory.

for example if you have a matrix BYTE matrix[254][100];

so in this case width = 254 elements and gets 254 * sizeof(BYTE) = 245 bytes.

If used cudaMallocPich, pitch = 256 bytes, so it warranty for coalescing access pattern.

cudaMallocArray is not a normally structure, this structure is Z-curve.

http://en.wikipedia.org/wiki/Z-order_(curve)

It builds for optimize using texture. You can not directly access data on cudaArray.

The only way is using tex1D, tex2D, tex3D function to READ ONLY data from texture.

“this texture bound to cudaArray”

of cause you can bind texture to global memory has declared by using cudaMallocPitch, and

READ ONLY data from texture by using tex1Dfetch().

Quoc_Vinh · March 26, 2009, 8:11am

I haven’t compared yet.

if you can access coalescing pattern in global memory, I think that access speed faster than accessing to cudaArray (bind texture).

in some case, you can not access coalescing pattern to global memory, cudaArray is a good Idea.

darot · March 26, 2009, 8:49am

I understand, thank you so much

dgp06 · May 5, 2009, 8:17pm

But texture memory is cached, if read only memory is what you need, maybe it is better.

byung · September 4, 2009, 6:08pm

cudaMallocPitch is used for global memory, this is a normally memory structure,

you can READ and WRITE to this memory.

pitch is a size for optimize coalescing access to global memory.

for example if you have a matrix BYTE matrix[254][100];

so in this case width = 254 elements and gets 254 * sizeof(BYTE) = 245 bytes.

If used cudaMallocPich, pitch = 256 bytes, so it warranty for coalescing access pattern.

cudaMallocArray is not a normally structure, this structure is Z-curve.

http://en.wikipedia.org/wiki/Z-order_(curve)

It builds for optimize using texture. You can not directly access data on cudaArray.

The only way is using tex1D, tex2D, tex3D function to READ ONLY data from texture.

“this texture bound to cudaArray”

of cause you can bind texture to global memory has declared by using cudaMallocPitch, and

READ ONLY data from texture by using tex1Dfetch().

Isnt’ pitch 100?