I need to use the constant memory to pass a bunch of parameters as a struct to a kernel.
To do so, in my .cl file, I define the s
typedef struct KernelParams {
float4 ps,c0;
float4 maxidx;
uint4 dimlen,cp0,cp1;
uint2 cachebox;
float minstep;
float twin0,twin1,tmax;
float oneoverc0;
unsigned int isrowmajor,save2pt,doreflect,dorefint,savedet;
float Rtstep;
float minenergy;
float skipradius2;
float minaccumtime;
unsigned int maxdetphoton;
unsigned int maxmedia;
unsigned int detnum;
unsigned int idx1dorig;
unsigned int mediaidorig;
} KParam __attribute__ ((aligned (16)));
...
__kernel void main_loop( ..., __constant KParam gcfg[]){
...
if(gcfg->doreflect && ...){
...
}
}
in my host unit, I define the (exactly) same struct in the header file,
and
KParam param={...};
cl_mem gparam;
mcx_assess((gparam=clCreateBuffer(context,RO_MEM,sizeof(KParam),¶m,&status),status));
...
mcx_assess(clSetKernelArg(kernel,15, sizeof(cl_mem), (void*)&gparam));
mcx_assess(clEnqueueNDRangeKernel(commands,kernel,1,NULL,mcgrid,mcblock, 0, NULL, NULL));
...
when running this code with an ATI card, it did ok and the values are corrected
passed into the kernel. However, when running it on a nvidia card with CUDA 3.2,
the buildprogram phase failed with segfault error. If I replace “if(gcfg->doreflect && …)”
to “if(0)”, the program can compile but the results are incorrect.
Did I do anything obviously wrong in this case? I am particularly interested
if my alignment settings and the orders of the struct members are ok.
Any comment is welcome! thanks in advance