Here in this code, in the last 2 statements inside the kernel, if you run with both the printf, it shows error. But if you comment any one and leave the other. It works Fine.
I know this kind of aliasing of pointers in not recommended. But this time it is really required. What we did here is to point the half2 at the at the 4th half value manually. Problem is, then it seems to have weird behavior. It can not be written or updated (try doing something with x2, update or assign it to twin). Really need the discussion to make this work, or at least know how Nvidia actually implements the half2.
you can only seem to just read and print, but can not do anything even after extracting the half
assuming that the original x pointer passed to the kernel was allocated via e.g. cudaMalloc, then bytePtr is properly/naturally aligned.
bytePtr +=3*sizeof(half); // adjust pointer
given that the original x pointer is properly/naturally aligned to half, this bytePtr is properly aligned to both its native type (char) and also half.
ptr = reinterpret_cast<half*>(bytePtr);
ptr is properly/naturally aligned to its native type (half)
x2=(half2*)ptr;
x2 is not properly/naturally aligned to half2 type, its native type. Therefore this pointer is incorrectly constructed and subsequent usage (i.e. dereferencing) is illegal in CUDA.
You’re getting confused by your own indexing (thinking about things with zero based indexing or one based indexing). In zero-based indexing (not the nomenclature you are using), the “4th” half value would be naturally aligned for a half2 pointer. But in one-based indexing (what you are actually using from a nomenclature standpoint, and what your code is actually doing) the 5th value would be the properly aligned one.
half values: x y z a b c d e f ...
zero-based: 0 1 2 3 4 5 6 7 8 ...
one-based: 1 2 3 4 5 6 7 8 9 ...
half2 aligned: * * * * * ...
Thank you for your reply
half indices(var x): 0 1 2 3 4 5 6 7 8 half2* y=(half2*)x makes (0,1), (2,3),(3,4), (5,6) as half2 pairs and can be accessed by y[0], y[1], y[2]…
I want to point a half2 pointer from index 3 or any odd index position in x, so (2,3) would be new half2, (4,5) would be new half2.
That is my question. How can I achieve it in Cuda, is it even possible?
Not possible in CUDA. I already linked to the relevant doc section indicating this. Dereferencing a half2 pointer (or any type pointer) in device code requires “natural” alignment of the pointer.
You would have to fall back to half pointer use, or else use the half2 pointer and pack/unpack adjacent quantities.
Thank you again for such insight.
But what do you meant by " usehalf2 pointer and pack/unpack adjacent quantities."
it still will not give me half2 of index pairs (2,3), (4,5)?
Suppose I have a half2 pointer *h2 pointing to element 0.
Suppose I want half elements 1 and 2, packed into a single half2 quantity.
You would retrieve half2 elements 0 and 1, unpack those two quantities into 4 half quantities, 0-3, and repack half quantities 1 and 2 into a new half2 quantity nh2: