I am quite new to openCL programming. Just worked through some examples and read those nvidia guides. There are still some questioned I didn’t manage to figure out yet. Hopefully someone here is able to help me out.
It’s said to be important to take care for memory access to be coalesced. There are some conditions bound to that requirement. I had a look at the examples in the nvidia programming guide (p. 34). I guess I figured out a lot but what I’m still insecure about is where memory is aligned when I create buffers. There is said to be like 32byte, 64byte or 128byte accesses. The coalesced reads seem to be dependent on the thread0 of the executed half-warp to read from an adress that is the first out of a segment in global memory (at least with a device of compute capability 1.1).
My question is how arrays in global memory are aligned. Are they aligned automatically? Are they aligned to 32byte, 64byte or 128byte segmentborders?
I hope my question is clear and someone is able to give me some hints. Would be of great help to me. Thanks in advance!