CUDA_EXCEPTION_1, Lane Illegal Address but size was allocated with malloc

Hi,

i have a problem with my kernel. I get an exception from an Illegal address, and the debugger gives me a line, but i can’t find the mistake. Where can the array be erased?

On my Q600 with tk 4 it works fine,

on a cluster with Tesla C2050 tk 3.2 it fails

both compiled with sm_20

Thanks,

mic

__global__ void ExactKernel()

{

	unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x; //get total id of the current thread

	if(tid>c_numsr-1) return; //if more threads started than needed

	unsigned int srlength = c_srlength;

	unsigned char *l_SR = (unsigned char *)malloc(sizeof(char)*srlength); //= (unsigned char *)malloc(sizeof(unsigned char)*c_srlength);

	//copy Sr to local variable

	for(int i=0; i<srlength; i++){

		l_SR[i] = k_sr[tid*srlength+i];

		if(l_SR[i]>3) return; //no match possible

	}

	/*

	 * run search on forward strand

	 */

	int fwd = SearchPart(tid, l_SR, 0);

	/*

	 * if no hit found, search on reverse strand

	 */

	if(fwd==0){

		unsigned char *l_SR_bwd = (unsigned char *)malloc(sizeof(char)*srlength);

		for(int i=0; i<srlength; i++){

			switch(l_SR[i]){

			case 0: l_SR_bwd[srlength-i-1] = 3; break; //A->T

			case 1: l_SR_bwd[srlength-i-1] = 2; break; //C->G

			case 2: l_SR_bwd[srlength-i-1] = 1; break; //G->C

			case 3: l_SR_bwd[srlength-i-1] = 0; break; //T->A

			}

		}

		SearchPart(tid, l_SR_bwd, 1);

		free(l_SR_bwd);

	}

	free(l_SR);

}
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address.

[Switching to CUDA Kernel 1 (<<<(32,0),(65,0,0)>>>)]

0x000000001cc353f8 in ExactKernel () at kernel_nomm.cu:789

789				switch(l_SR[i]){

(cuda-gdb)

before you run your program in the debugger turn on cuda memcheck

set cuda memcheck on

This might give you a better idea of where the error is

also as a general comment you could change this switch

for(int i=0; i<srlength; i++){

                        switch(l_SR[i]){

                        case 0: l_SR_bwd[srlength-i-1] = 3; break; //A->T

                        case 1: l_SR_bwd[srlength-i-1] = 2; break; //C->G

                        case 2: l_SR_bwd[srlength-i-1] = 1; break; //G->C

                        case 3: l_SR_bwd[srlength-i-1] = 0; break; //T->A

                        }

                }

to this:

for(int i=0; i<srlength; i++){

                        l_SR_bwd[srlength-i-1] = 3 - l_SR[i]

                }

This would avoid divergence from the switch.

The error in my post is from gdb with “set cuda memcheck on”, thats all the information i get

Thanks, this is a good idea, i will do like you suggest.

Does it work in 4.0 on the c2050? This could be a bug that was fixed.

The machine with the c2050 is a cluster which is not under my administration, but I will try to get 4.0 installed. I’ll report if this fixes the problem.

best,

mic

The installation of toolkit 4 solved the problem, there must be a bug in the 3.2 version.

thx,

mic