3D Textures -- xyz@CPU becomes zyx@GPU Is this the normal behaviour?

Sarnath · December 29, 2009, 8:58am

All,

I have a 3D array in CPU declared like ARRAY[128][128][128]. Since the accesses can benefit from 3D Spacial Locality, I decided to move it to a 3D texture…

So, I allocated a 3D array using cudaMalloc3DArray() API, did a cudaMemcpy3D() (which is not so straightforward) and acessed it from kernel…

I get correct results if I use z,y,x as the 3D coordinate while acessing the texture from kernel… If I use x,y,z, I dont get right results…

–edit-- I have attached a small CU file that reproduces this problem. As Avidday indicates, this is probably a missing documentation case. but yeah, it gets difficult without that.

CUDA 2.3, Linux Ubuntu, 32-bit

Can some1 shed some light?

btw, I got 5x to 7x the performance gain using 3D textures – so they are tooooo cooooool, Thanks to NV!

Best Regards,
Sarnath
tex3d.cu (2.78 KB)

avidday · December 29, 2009, 9:21am

Probably row major order versus column major order. I have only ever used 3D textures with a code with column major ordered storage, and it just worked, but I have no proof that it is really ordering that is in play here. The 3D texture documentation is rather, err scant…

Sarnath · December 29, 2009, 10:30am

Oh… Is it? Thanks! Documentation always helps…

Avidday, Have a good day! Thanks,

Sarnath · December 29, 2009, 11:00am

I have attached a small CU file in my first post that reproduces the issue. FYI.

@NV, Kindly throw some light on this and probably update the 3D texture documentation a bit - so that it covers important cases like this.

Nico · December 29, 2009, 2:22pm

IMO, this is normal behaviour which also applies to 2D textures. It just depends on how you store the texture data.

e.g. it is specified somewhere in the documentation that the data for a 3D texture should be a concatenation of the various slices, where each slice is a concatenation of rows an each row is a sequential array of data.

if you have 2D linear memory in CPU as

type mem[64][64]

then mem[y] is actually a pointer to the first element (x=0) at row y and mem[y] is the actual data at index x of row y, but if you want to access the same data through a texture, you have to use (x,y) in the tex2D call.

The same applies for 3D textures, where you have to use [z][y] to access the value at coordinates (x,y,z) instead of (z,y,x).

N.

Sarnath · December 30, 2009, 5:13am

@Nico,

Thanks! The Progrmaming guide explains how to access 3D arrays in the section under “device memory - section 3.2.1”. There in, the Z,Y,X relation shows up correctly… I think it is left to the reader to assume that “cudaMalloc3Darray()” works similarly and texture acceses also need to be done that way…

–edit-- but if you look @ it from the CPU side, an array of [Y][Z] is represented as X amount of YZ slices in memory… When these slices are copied to GPU, the GPU sees this as Z amount of YX slices. This is an important point to note. And hence texture accesses need to be Z,Y,X – This really needs some documentation somewhere.

Anyway, it works for me now and performance rocks! Good enough…

Thanks for your hints. It was very useful,

Best REgards,
Sarnath

StickGuy · January 4, 2010, 1:10pm

While the documentation probably could be clearer, the real problem is your lack of understanding of concepts like row-major ordering. The CPU and the GPU are using the same ordering, which is the one you decide on. The texture function parameter names imply that they expect row-major ordering in your data, but they should work fine with column-major ordering as well provided you are aware of it and supply the coordinates correctly.

Sarnath · January 4, 2010, 1:18pm

Very difficult concepts… Isnt it?

Yeah they use the same ordering… CPU calls it as X,Y,Z and GPU prefers to call it as Z,Y,X. This is just how they prefer to call it.

If you read my previous post, I have mentioned the place in the programmers guide where this is quite clear (Section 3.2.1).

So, therez nothing to row-major and column-major ordering here… unless i am missing something drastic… I have never used textures in the graphical sense and probably thats why I am not seeing something that you see.

I don’t understand how the function parameter names imply something about ordering… Can you explain it?

StickGuy · January 4, 2010, 2:21pm

This seems to be the source of your confusion. The CPU and the GPU “prefer to call it” the same thing. Consider this array of data:

0  1

2  3

The element at (x = 0, y = 1) is 2. However you arrange these four numbers in memory, the conceptual layout should still be the same: a 2 x 2 matrix of data where the element in the first column of the second row is 2.

In C, with a (standard, row-major) 2D array, you would access element (0, 1) as follows:

data[1][0]

In C, with a 1D, row-major array, you would access element (0, 1) as follows:

data[1 * 2 + 0]

With a CUDA texture where you filled it using data from a row-major array, you would access element (0, 1) like so:

tex2D(data, 0, 1);

If we reverse things, though, and store data in a column-major fashion, it would look like this:

data[0][1]

data[0 * 2 + 1]

tex2D(data, 1, 0);

A 3D array and texture would behave analogously.

If you take a look at my examples, I’m accessing the element (x = 0, y = 1). If you look at the names of the tex2D parameters, you’ll find that they are: texRef, x, y. In the row-major example, these match up. I’m supplying 0 for the x parameter and 1 for the y parameter. However, in the column-major example, they don’t match. Hence, the expected ordering is row-major, but really you can use it either way you want as long as you are consistent.

I suggest you read this Wikipedia article as it may help you understand row-major vs. column-major ordering and what it means:

http://en.wikipedia.org/wiki/Row-major_order

Sarnath · January 5, 2010, 5:00am

Stickguy,

Thanks for your time and a good example.

Let me take your example.

0  1

2  3

You prefer to call [0,1] as 2 because of “graphics” knowledge… i.e. If you look @ your screen, the top-left most is 0,0 and as you descend down “Y” increases… This is where the confusion starts… This ordering is opposite to the memory ordering in “C” language (i.e. row-major)

Now my point is simple: I dont deal with graphics. i deal with a molecular chemistry application and I use a 3D texture… When I copy to GPU, I need to know that this kind of ordering change happens… It is very important that this is documented in the manual. Thats my point.

Thanks for your time,

Best REgards,

Sarnath

Nico · January 5, 2010, 9:36am

You’re absolutely right. It’s mostly due to my OpenGL experience that I don’t have a problem understanding this coordinate mapping and I can imagine that it can be difficult to interpret without graphics knowledge.
The “upper left corner” you’re referring to is mentioned in the CUDA reference manual though.

N.

Sarnath · January 5, 2010, 10:23am

Nico,

Thanks for bringing up this point. I checked the CUDA reference manual. It just mentions “upper left” under a few APIs… but I think non-graphics guys wont make head or tail of it :-)

Anyway, They just need to have a section on 2D and 3D arrays and explain a bit about all these nuances. That would do. I hope they are working on improving the documentation.

Thanks,
Best Regards,
Sarnath