Hello,
I could have find a bug using CUDA with an aligned memory (because I use the SSE instructions on CPU and alignment is good for performance)
In fact if I construct a nested array based on an aligned 16 bytes block, I could corrupt the memory with data exchange between the CPU <-> GPU.
(on mac my and a linux-pc)
I write a small program to put in view the “bug ?”
1 #include <iostream>
2 #include <cuda.h>
3 #include <cuda_runtime_api.h>
4
5 #define NUM 3
6
7 template<int N>
8 struct container
9 {
10 int & operator[](int i){return a[i];} //just for left value init
11
12 //int a[N];
13 int a[N] __attribute__ ((aligned (16))); //bug if N is not init to a multiple of 8
14 };
15
16 int main(int argc, char* argv[])
17 {
18 container<NUM> a[10],arecv[10]; // set up array
19
20 for(int i=0; i<10;i++){ //init ...
21 for(int j=0; j<NUM;j++)
22 a[i][j] = i;
23 }
24
25 int* pa; //pointer for cuda
26
27 cudaMalloc((void**)&(pa),10* NUM*sizeof(int)); // alloc
28 cudaMemset((void*)pa,0, 10*NUM*sizeof(int)); // set to 0
29 cudaMemcpy((void*)pa,(void*)a,10*NUM*sizeof(int),cudaMemcpyHostToDevice);
30 cudaMemcpy((void*)arecv,(void*)pa,10*NUM*sizeof(int),cudaMemcpyDeviceToHost);
31
32 for(int i=0; i<NUM*10;i++)
33 printf(" %i \n ", *(&arecv[0][0]+i)); // print, should be 000,111,222,333 ....
34
35 return 0;
36 }
I get with the GCC allocator, for N=3
0
0
0
32767
1
1
1
32767
2
2
2
32767
3
3
3
32767
4
4
4
1
5
5
5
32767
6
6
6
32767
7
7
instead of
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
7
7
7
8
8
8
9
9
9
If I remove the aligned instruction, everything is fine.
my config :
Cuda compilation tools, release 4.0, V0.2.1221
GPU Driver Version: 7.4.10 270.05.05f01
MacOS 10.7
A few personal comments and questions :
[list=1]
[*]Is it a bug ?
[*]The align command should only affect the cpu men not inside the GPU ?
[*]If I align the pointer pa, it does not change anything to the bug, and anyway does it have any importance ?
[*]Tha align command just assure the first address of the array is aligned on 16 bytes, what is the incompatibility with CUDA ?
[*]CUDA API do make tricky stuff incompatible with alignment ?
Thank you, for your help !
main.cpp (937 Bytes)