I’ve come across an issue while porting Scott Draves’ Fractal Flames to the GPU. I had intended to pass an array of 4 real numbers (actually floats) to the GPU. Worked just fine in emulation.
typedef float real;
On the host, the bounds array was declared like this
I was trying to pass the array by value into a kernel
global void iterate_kernel(int n, int fuse, point *points, int width, int height, real bounds);
The kernel would segfault (actually unspecified launch fail) when making use of bounds elements in calculations.
The code ran fine in emulation however.
I worked around it by passing bounds, bounds, etc… explicitly
global void iterate_kernel(int n, int fuse, point *points, int width, int height, real bounds0, real bounds1, real bounds2, real bounds3);
So I conclude that there must be a problem with alignment of such arrays when being passed by value.
This is with CUDA toolkit 2.3 on OpenSuse Linux 32 bit.
Has anyone stepped onto this problem before?