Strange memory errors when calculating in kernel-function

Hi all,

I have 2 device funtions:

__device__ float xtheta(int x, int y, float theta){

	float xt = x*cos(theta)+y*sin(theta);

	return xt;

}

__device__ float ytheta(int x, int y, float theta){

	float yt = y*cos(theta)-x*sin(theta);

	return yt;

	

}

And a Kernel like that:

__global__ void complex_filter_responses(float *real_filter, float lambda, float sigma_x, float sigma_y, int number_of_orientations)

{

	int tx = threadIdx.x+blockIdx.x*blockDim.x;

	int ty = threadIdx.y+blockIdx.y*blockDim.y;

	int offset = tx+ty*blockDim.x*gridDim.x;

	int filter_size=blockDim.x*gridDim.x*blockDim.y*gridDim.y;

	

	for(int i=0;i<number_of_orientations*2;i=i+2){

                float x_theta = xtheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);

		float y_theta = ytheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);

		real_filter[offset+i*filter_size]=x_theta*y_theta;

                real_filter[offset+(i+1)*filter_size]= x_theta;

	}

	

}

This code was working just fine.

However, I now need to add additional calculations to the Kernel so replaced

real_filter[offset+i*filter_size]=x_theta*y_theta];

with

real_filter[offset+i*filter_size]=exp(-0.5*(pow(x_theta,2)/pow(sigma_x,2)+pow(y_theta,2)/pow(sigma_y,2)))*cos(2*M_PI/lambda*x_theta+0);

Replacing this single line will cause an

cudaError_enum at memory location

So my first guess was that I have somewhere a division with 0 (which is unlikely because i doesn’t impact the equation) so I tried to run the code without the for-loop:

real_filter[offset]=exp(-0.5*(pow(x_theta,2)/pow(sigma_x,2)+pow(y_theta,2)/pow(sigma_y,2)))*cos(2*M_PI/lambda*x_theta+0);

which worked well.

Now I tried if I had a index-out-of-bound error but this wasn’t the case cause I could easily write to all elements from real_filter[0]<->real_filter[offset+(i+1)*filter_size]

without any errors.

Also

for(int i=0;i<number_of_orientations*2;i=i+2){

			float x_theta = xtheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);

			float y_theta = ytheta(tx-(blockDim.x*gridDim.x)/2,ty-(blockDim.x*gridDim.x)/2,M_PI/3);

			real_filter[offset+i*filter_size]=0.0;

		}

worked without any errors.

[b]

Summing up: The error only occurs if I’m recalculating the value like this:[/b]

real_filter[offset]=exp(-0.5*(pow(x_theta,2)/pow(sigma_x,2)+pow(y_theta,2)/pow(sigma_y,2)))*cos(2*M_PI/lambda*x_theta+0);

I’m still veeery new tu cuda AND c so please be patient. I bet I just made a stupid mistake :P

Thanks in advance!

…Nevermind…

It was a really stupid mistake during memory allocation in my wrapper…

sorry

False alarm problem is not solved at all:

Can someone please explain why this code works:

float e=exp(-0.5*(pow(x_theta,2)/pow(*sigma_x,2)+pow(y_theta,2)/pow(*sigma_y,2)))*cos(2*M_PI/(*lambda)*x_theta+0);

float b=exp(-0.5*(pow(x_theta,2)/pow(*sigma_x,2)+pow(y_theta,2)/pow(*sigma_y,2)))*cos(2*M_PI/(*lambda)*x_theta+0);

real_filter[offset]=e;

real_filter[offset]=e;

but this not?

float e=exp(-0.5*(pow(x_theta,2)/pow(*sigma_x,2)+pow(y_theta,2)/pow(*sigma_y,2)))*cos(2*M_PI/(*lambda)*x_theta+0);

float b=exp(-0.5*(pow(x_theta,2)/pow(*sigma_x,2)+pow(y_theta,2)/pow(*sigma_y,2)))*cos(2*M_PI/(*lambda)*x_theta+0);

real_filter[offset]=b;

real_filter[offset]=b;

I know, the code doesn’t make sense but I need to understand the problem…

PS: Too many registers?

ah ive had a similar problem for a few hours now, i had a macro involved in one of my kernels and i kept getting the cudaerror_enum error. read your post about the possibility of using too many registers and so i rewrote the macro as a function with multiple temporary variables. though uglier, the code now works ! thanks :D