In CUDA SDK: [url=“http://www.nvidia.com/content/cudazone/cuda_sdk/CUDA_Basic_Topics.html#alignedTypes”]http://www.nvidia.com/content/cudazone/cud...ml#alignedTypes[/url]
I got the following output:
Testing misaligned types…
…
RGBA8_misaligned…
Avg. time: 196.819717 ms / Copy throughput: 0.236593 GB/s.
TEST PASSED
…
Testing aligned types…
…
RGBA8…
Avg. time: 5.652469 ms / Copy throughput: 8.238193 GB/s.
TEST PASSED
…
The results shows about 35 times differences between RGBA8_misaligned and RGBA8, however, I don’t understand the real differences between them, they are defined as:
typedef struct{
unsigned char r, g, b, a;
} RGBA8_misaligned;
typedef struct align(4){
unsigned char r, g, b, a;
} RGBA8;
Both struct have size of 4 bytes.
I understand if you define a struct with size 3 bytes, you may need to adjust to 4 bytes, or in the physical memory, the starting memory address affects the speed too. but, it doesn’t seem to be a reason here.
So what make the differences?