Dynamic Allocation Memory in Kernel

Hi, i want to use dynamic array in my kernel with c variable, but i always getting this error “error code unspecified launch failure” when i use thread more than 1 threa. and if i comment this code c[letak] = 5; i did’t get this error. how to solve it?

__global__ void FillMatrix(char **sequence,int *s_length, int n, int *score)
{
  int b = threadIdx.x + blockIdx.x*blockDim.x;
  int a = threadIdx.y + blockIdx.y*blockDim.y;
  const int rows = s_length[a],cols = s_length[b];
      char *X = sequence[a];
      char *Y = sequence[b];
      
      const int jum = (cols + 1)*(rows + 1);

      size_t sizeArr = jum;
      int *c;
      c = (int*)malloc(sizeArr);
      memset(c,0,sizeArr);
      printf("Thread %d got C pointer: %p\n", threadIdx.x, c);
      score[(a*(n)) + b] = 0;
      if(b < n){
        printf("index ke %d - %d = %d --- %d - %d \n", a,b, jum, cols, rows);
        int letak,kiri,atas,miring, n_letak, n_atas, n_kiri,x=0,y=0,i;
        for(i = 0; i < jum;i++)
        {   
          if(i > (cols + 1 )  && (i % (cols+1) != 0))
          {
              y = (i/(cols + 1) - 1);
              x = (i-1) % (cols + 1);
          
              letak = i ;
              kiri  = letak - 1;
              atas  = (letak - cols) - 1 ;
              miring = atas - 1;
              int scoring = scoringsMatrix[X[y] - 'A'][Y[x] - 'A'];    
              n_letak = c[miring] + scoring;
              n_kiri = c[kiri] + GAP;
              n_atas = c[atas] + GAP;
              c[letak] = 5;
          }
        } 
        printf("score %d - %d = %d \n", a,b,score[(a*(n)) + b]);
        free(c);
      }
 }
  1. Make sure you are using proper CUDA error checking in your host code. (probably you are)
  2. Any time you are having trouble with a kernel that is using dynamic memory allocation (in-kernel malloc or new) it’s good practice to test the returned pointer for NULL (i.e. 0). This is the the API’s way of letting you know an allocation error occurred. A common reason for allocation failure is exceeding the device heap size: read the programming guide section:

[url]Programming Guide :: CUDA Toolkit Documentation

  1. The error you are getting may be indicating that your kernel is performing an illegal or out-of-bounds memory access. You can localize such errors to a single line of kernel code, using the methodology described here:

[url]cuda - Unspecified launch failure on Memcpy - Stack Overflow

  1. If you still need help after that, post a short, but complete code (that someone else could run without having to add anything or change anything.) I suspect that there is far too much in the way of indexing and the memory allocation for anyone to be able to say what is wrong based on just the kernel code.