Hello all! I’m having a few problems with uncoalesced loads from global memory and think it may be due to how my data is organized.
First of all, what I need is to send to my kernel N buffers of the same size.
Currently, what I have is something similar to:
typedef struct _tuple_soa {
TupleAttr *attributes;
int size;
} TupleSOA;
typedef struct _tuple_attr {
disc_type type;
union {
int *buffer_int;
float *buffer_float;
} attr_data;
int size;
} TupleAttr;
Where TupleAttr *attributes is of size N.
In my kernel I’ll be reading and writing to those buffer_int and buffer_float inside the TupleAttr struct.
I’m not sure if this is the right approach to represent my data in order to to have coalesced accesses…Should I try something different?
OKay, so after profiling a trying to understand it I’m kinda confused.
The way I’m accessing the memory is:
int i = in->attributes[id_attr].attr_data.buffer_int[id_thread];
out->attributes[0].attr_data.buffer_int[id_thread] = i;
What confuses me is that if I don’t write to the output buffer (out), I get no uncoalesced accesses. So why am I getting uncoalesced while write to the output?
OKay, so after profiling a trying to understand it I’m kinda confused.
The way I’m accessing the memory is:
int i = in->attributes[id_attr].attr_data.buffer_int[id_thread];
out->attributes[0].attr_data.buffer_int[id_thread] = i;
What confuses me is that if I don’t write to the output buffer (out), I get no uncoalesced accesses. So why am I getting uncoalesced while write to the output?
Removing the output line probably caused the compiler to optimize out the kernel and not to do anything - therefore no uncoalesced accesses :)
your kernel probably didnt run at all.
eyal
That makes sense…Hmm, but anyway, I need to write to the output. Is there a better option than what I’m doing?