Writing on CUDA I don't know why I can't extract results of the kernel

Hello everyone,

I am a student of Granada’s University,(Spain) , we are trying to work on alignment of protein
The problem is that I have write a code, that, works well when you simulate it on CPU, but when you execute it on GPU, it returns, three times the same results, it means, it writes three times the results (correct result ) for the last protein, you introduce.

The code i have write, is (in general line, pseudo code ) these way :

hmmsearch

main_cuda(){
struct plan7 hmm;
while (leer dsq ){

	nseq++
}
int *All_mx = malloc (nseq * sizeof(int));
All_mx = ..../* some data */

int *All_L = malloc (nseq * sizeof(int));
All_L = ..../* some diferent data. a Vector, that contains lengths */

P7Vitebi_cuda(All_mx, All_L,hmm,nseq );

}

//////////

viterbi_cuda

P7Viterbi cuda(int All_mx,int All_L,struct plan7 hmm,int nseq){
int*xmx_d = cudamalloc ( 5 * M sizeof (int) );
int
mmx_d = cudamalloc ( L * M sizeof (int) );
int
dmx_d = cudamalloc ( L * M sizeof (int) );
int
imx_d = cudamalloc ( L * M *sizeof (int) );

int *All_mx_d = cudaMalloc (nseq *sizeof (int));
cudaMemCpy(All_mx_d,All_mx, HostToDevice);

int *All_L_d = cudaMalloc (nseq *sizeof (int));
cudaMemCpy(All_L_d,All_L, HostToDevice);

Viterbi_kernel<<<dimgrid, dimbloq>>> (nseq,xmx_d,mmx_d,imx_d,dmx_d,All_mx_d,All_L);

cudaMemCpy(All_mx,All_mx_d, DeviceToHost);
print ( All_mx )	

}

//////////////

Viterbi_kernel(int nseq,int xmx_d,int mmx_d,int *imx_d,int *dmx_d,int All_mx_d,int All_L){

int j = threadIdx.x; 
xmx = ... /* some data */
mmx = ... /* some data */
dmx = ... /* some data */
imx = ... /* some data */

int L=All_L_d[j];


int *ptr = All_mx + L;

for (t = 0 to nseq )
	ptr [t] = xmx[t];
	ptr [t+1] = mmx[t];
	ptr [t+2] = dmx[t];
	ptr [t+3] = imx[t]; 

}

=================================

maybe could i have forgotten something like __syncthreads() ?
For example i have tried writing __syncthreads(); bellow of these intructions, but doesn’t work.

int L=All_L_d[j];
   __syncthreads();

      ....

    for (t = 0 to nseq ){
	ptr [t] = xmx[t];
	ptr [t+1] = mmx[t];
	ptr [t+2] = dmx[t];
	ptr [t+3] = imx[t]; 
            __syncthreads();
   }

========================
Any idea? I’m writing well the data ptr? i mean, could be the problem in other calculations … ?
thank you. :ph34r:

Pedro

Your cudaMemcpy calls are incorrect:

int *All_mx_d = cudaMalloc (nseq *sizeof (int));
cudaMemCpy(All_mx_d,All_mx, HostToDevice);

int *All_L_d = cudaMalloc (nseq *sizeof (int));
cudaMemCpy(All_L_d,All_L, HostToDevice);

Viterbi_kernel<<<dimgrid, dimbloq>>> (nseq,xmx_d,mmx_d,imx_d,dmx_d,All_mx_d,All_L);

cudaMemCpy(All_mx,All_mx_d, DeviceToHost);

You also need to pass the number of bytes you want to transfer.
For example, the first one should be:
cudaMemCpy(All_mx_d,All_mx,nseq *sizeof (int), HostToDevice);

Thanks, but no, this is not the problem. This is pseudo-code, i mean, a diagram, I don’t write all the parameters for simplicity.

I think, that the problem is, that I’ m trying to declare dimblock with a variable, instead of a constant. i mean

dim3 dimbloq (nseq, 1, 1);

where nseq is : int nseq = 32;

I mean, i think i cannot declare dimbloq with variables, i need constant like

#define DIM_BLOQ 16


dim3 dimbloq (DIM_BLOQ, 1, 1);

Could this be the real problem?

Thanks a lot!!!

There is no problem using a variable in as dimblock

I don’t understand your code and the return values but three things that may help

  1. You need synchthreads if you want data written between different threads in the same block to be readable by other threads

  2. you need to read back the output between kernel calls or it will be overwriten by the next kernel

  3. Make sure that your kernel returned without error and that it doesn’t overflow it’s memory which from experience tends kill the kernel without and error/warning (don’t remember if I ever waited for kernel finishes before checking though …) in which case the data in memory won’t be changed as the kernel run abborted