bug (?) with GCC attribute ((aligned (16))); memory alignment corrupts data

dr_Tim · September 9, 2011, 11:10pm

Hello,

I could have find a bug using CUDA with an aligned memory (because I use the SSE instructions on CPU and alignment is good for performance)

In fact if I construct a nested array based on an aligned 16 bytes block, I could corrupt the memory with data exchange between the CPU <-> GPU.

(on mac my and a linux-pc)

I write a small program to put in view the “bug ?”

1 #include <iostream>

  2 #include <cuda.h>

  3 #include <cuda_runtime_api.h>

  4 

  5 #define NUM 3

  6 

  7 template<int N>

  8 struct container

  9 {

 10     int & operator[](int i){return a[i];} //just for left value init

 11 

 12     //int a[N];

 13     int a[N]  __attribute__ ((aligned (16))); //bug if N is not init to a multiple of 8

 14 };

 15 

 16 int main(int argc, char* argv[])

 17 {

 18    container<NUM> a[10],arecv[10]; // set up array

 19 

 20    for(int i=0; i<10;i++){ //init ...

 21        for(int j=0; j<NUM;j++)

 22            a[i][j] = i;

 23    }

 24 

 25    int* pa; //pointer for cuda

 26 

 27    cudaMalloc((void**)&(pa),10* NUM*sizeof(int)); // alloc

 28    cudaMemset((void*)pa,0, 10*NUM*sizeof(int)); // set to 0

 29    cudaMemcpy((void*)pa,(void*)a,10*NUM*sizeof(int),cudaMemcpyHostToDevice);

 30    cudaMemcpy((void*)arecv,(void*)pa,10*NUM*sizeof(int),cudaMemcpyDeviceToHost); 

 31 

 32    for(int i=0; i<NUM*10;i++)

 33        printf(" %i  \n ", *(&arecv[0][0]+i)); // print, should be 000,111,222,333 ....

 34 

 35     return 0;

 36 }

I get with the GCC allocator, for N=3

instead of

If I remove the aligned instruction, everything is fine.

my config :

Cuda compilation tools, release 4.0, V0.2.1221

GPU Driver Version: 7.4.10 270.05.05f01

MacOS 10.7

A few personal comments and questions :

[list=1]

[*]Is it a bug ?

[*]The align command should only affect the cpu men not inside the GPU ?

[*]If I align the pointer pa, it does not change anything to the bug, and anyway does it have any importance ?

[*]Tha align command just assure the first address of the array is aligned on 16 bytes, what is the incompatibility with CUDA ?

[*]CUDA API do make tricky stuff incompatible with alignment ?

Thank you, for your help !
main.cpp (937 Bytes)

tera · September 10, 2011, 1:11am

Why do you think you should get 0,0,0, 1,1,1, 2,2,2,…? You are printing contiguous integers from memory, and have asked for alignment. The gaps caused by aligning of the structure show up as random values.

dr_Tim · September 10, 2011, 7:18am

Oh shame on me ! I forgot this point ! So no bug, and thank you for your constructive remarks !

Topic		Replies	Views
Structure Alignment? CUDA Structure Alignment differs? CUDA Programming and Performance	12	49301	December 11, 2008
Alignment doesn't help for constant memory? CUDA Programming and Performance	3	2426	October 12, 2007
Is cuda::std::aligned_storage aligned at give alignment? CUDA Programming and Performance	4	424	July 28, 2023
Alignment Trouble (mismatch) CUDA Programming and Performance	2	7071	December 28, 2008
cuda passing user defined structure to a kernel failed CUDA Programming and Performance	3	1195	January 26, 2015
global memory alignment issue [pycuda] CUDA Programming and Performance	5	2658	January 21, 2012
Passing a struct to CUDA kernel as parameter - 'align' specifier needed ? CUDA Programming and Performance	5	3192	October 12, 2016
Worse performance of the exact same kernel depending on pointer location in allocated device memory CUDA Programming and Performance	9	368	February 24, 2021
cuda unify and memory alignement for CPU CUDA Programming and Performance	2	1129	November 21, 2016
Memory checker bug? CUDA Programming and Performance	1	617	September 4, 2017

bug (?) with GCC __attribute__ ((aligned (16))); memory alignment corrupts data

Related topics

bug (?) with GCC attribute ((aligned (16))); memory alignment corrupts data