Bug in float3* addressing


I have this simple test kernel:

__kernel void vectest(__global float3* data, __global float* ret) {

    float3 v=data[1];

    ret[0]=v.x; ret[1]=v.y; ret[2]=v.z;


data = [0, 1, 2, 3, 4, 5, 6]

ret = [4, 5, 6]

According to the data I receive in the ret array, the float3 variable v consists of the floats at index positions 4-6, instead of 3-5. It seems that when the driver converts the float3 array access to byte offset, it incorrectly assumes that the array element size is 44 bytes instead of 34.

Using the 64-bit Linux driver version 295.49, running on a GeForce GTX 285.


Hi Csaba!

Indeed, that is very frustrating, however I would discourage the use of any type aligned to 3*sizeof(), because GPUs don’t handle it well. If on host side you ask the size of these types, you will get sizeof(cl_float3) = sizeof(cl_float4). So when even host side does not use it properly, it gets very messy, even if device would use it properly.

GPUs are highly optimized for types aligned to either 4 bytes, or 16 bytes. Even if you need only 3 variables in a vector, use float4 and just disregard the last element. If you don’t want to waste the memory, use three independant arrays of floats.


Wow, you’re right. In fact, I was very wrong. I blindly assumed that the size of float3 was 12 bytes, but it is in fact 16 bytes, as clearly stated in the specification, section 6.1.5.

I apologize for falsely accusing NVIDIA out of sheer ignorance.