CUDA and char* programming

pedamallu · July 30, 2008, 6:56pm

Hi:

I am have a problem in using the CUDA language.

I am trying to compare two arrays of char* each of 100 strings.

I would like to compare these two arrays and create a output in 0 (if both strings are different) and 1 (if both strings are same).

I am have a weird problem ( External Image ). When i use the following function in the device side it is not working it is always giving 0:

global void add_arrays_gpu( char **in1, char **in2, int out, int out1, int seq11, int seq22, int thr)
{
int idx=blockIdx.xblockDim.x+threadIdx.x;
int idy=blockIdx.yblockDim.y+threadIdx.y;
int index=idx+idy*seq22;
if(idx < seq11 && idy < seq22){
if(in1[idx]==in2[idy])
out[index]=1;
else
out[index]=0;
__syncthreads();
}
}

Please help me in solving this problem.

Thanks

BenW · July 30, 2008, 7:35pm

After a quick glance through your code, the thing that strikes me is that when you compare the strings with the command

if(in1[idx] == in2[idy])

You aren’t de-referencing the (char )s all the way to the data, so you’re comparing two pointers instead of two chars, and the pointers will never be the same if these are different chunks of memory. How you resolve this depends on exactly what you’re trying to do, which I didn’t quite catch. If you have two sets of 100 strings (in which case char is the right type for in1 and in2), then you need another for loop to check each character in the respective substrings…i.e. replace your if with something like the following:

bool match = true;

for( int i = 0; i < 100; ++i )

    if( in1[idx][i] != in2[idx][i] )

    {

         match = false;

         break;

    }

out[index] = match;

If, on the other hand, you meant you have two strings of 100 characters, then you need to replace char** with char* (with appropriate changes in the host cu file).

Does that help?

Ben

pedamallu · July 30, 2008, 7:50pm

Thanks Ben for your kind email.

However, the code you gave didn’t help. Also, I am getting the following warnings:

“/tmp/tmpxft_00007b81_00000000-5.i”, line 6: Advisory: Cannot tell what pointer points to, assuming global memory space
“/tmp/tmpxft_00007b81_00000000-5.i”, line 6: Advisory: Cannot tell what pointer points to, assuming global memory space
“/tmp/tmpxft_00007b81_00000000-5.i”, line 14: Advisory: Cannot tell what pointer points to, assuming global memory space
“/tmp/tmpxft_00007b81_00000000-5.i”, line 16: Advisory: Cannot tell what pointer points to, assuming global memory space

I have two sets each of 100 strings. I want to compare all versus all i.e. 100 strings from set 1 compare to 100 strings from set 2 to identify the common one’s.

aakova · July 30, 2008, 8:35pm

I’ll note that you’re calling __syncthreads() when you aren’t using shared memory, and that the call is in a potentially divergent branch.

Can you build the EmuDebug version and step through the kernel in the debugger ?

BenW · July 30, 2008, 8:45pm

Hmmm… that looks like it might be a problem with the host code that puts the strings in cuda memory. Can you post your cudaMallocs and kernel call?

pedamallu · July 30, 2008, 8:59pm

This is my full code, includes both the kernel and the host code.

I printed out the strings copied into the device. They looks okay. Only problem is whenever, i try to do with strings i am not able do. External Image

#include “string.h”
#include “stdio.h”
#include “cutil.h”

global void compare_arrays_gpu( char **in1, char **in2, int out, char **outt, char **outt1, int seq11, int seq22, int thr)
{
int idx=blockIdx.xblockDim.x+threadIdx.x;
int idy=blockIdx.yblockDim.y+threadIdx.y;
int index=idx+idyseq22;
if(idx < seq11 && idy < seq22){

            outt[index]=in1[idx];
            outt1[index]=in2[idy];
            out[index]=1;

            if(outt[index]==outt1[index])
                    out[index]=1;
            else
                    out[index]=0;
    }

}
int main()
{
CUT_DEVICE_INIT();
/parameters/
int winsize=5;
int thres=2;

char *seq1;
char *seq2;

int i;
size_t seq1_len;
size_t seq2_len;

/* Allocate arrays a, b and c on host*/
seq1 = (char*) malloc(sizeof(char*));
seq2 = (char*) malloc(sizeof(char*));

seq1=“MNOPPPOPUOUOUIUITYRYFHFYTGHFEDTRTRT”;
seq1_len=strlen(seq1);

seq2=“MZMPSPOGUOUOUIUITYRYFHFYTGHFEDTRTRT”;
seq2_len=strlen(seq2);

/* pointers to host memory */
char **seq1parts;
char **seq2parts;
char **outer;
char **outer1;
int *output;
int *output1;

seq1parts = (char**) malloc((seq1_len-winsize)sizeof(char*));
seq2parts = (char**) malloc((seq2_len-winsize)sizeof(char*));
outer = (char**) malloc((seq2_len-winsize)(seq1_len-winsize)sizeof(char*));
outer1 = (char**) malloc((seq2_len-winsize)(seq1_len-winsize)sizeof(char*));
output = (int*) malloc((seq2_len-winsize)(seq1_len-winsize)sizeof(int));
output1 = (int) malloc((seq2_len-winsize)*(seq1_len-winsize)sizeof(int));

int k;
char temp1;
// create multiple sequence parts
temp1=(char) malloc((seq1_len)sizeof(char));
temp1=seq1;
for(k=0; k<seq1_len-winsize; k++){
char* temp;
temp=(char*) malloc((winsize)sizeof(char));
strncpy(temp, temp1, winsize);
seq1parts[k]=temp;
printf(“%s\n”,seq1parts[k]);
char* xx=temp1+1;
temp1=xx;
}
temp1=(char*) malloc((seq2_len)sizeof(char));
temp1=seq2;
for(k=0; k<seq2_len-winsize; k++){
char* temp;
temp=(char*) malloc((winsize)sizeof(char));
strncpy(temp, temp1, winsize);
seq2parts[k]=temp;
printf(“%s\n”,seq2parts[k]);
char* xx=temp1+1;
temp1=xx;
}

/* pointers to device memory */
char **seq1parts_d;
char **seq2parts_d;
char **outer_d;
char **outer1_d;
int *output1_d;
int *output_d;

/* Allocate arrays a_d, b_d and c_d on device*/
cudaMalloc ((void ) &seq1parts_d, sizeof(char)(seq1_len-winsize));
cudaMalloc ((void ) &seq2parts_d, sizeof(char)(seq2_len-winsize));
cudaMalloc ((void ) &outer_d, sizeof(char)(seq2_len-winsize)(seq1_len-winsize));
cudaMalloc ((void ) &outer1_d, sizeof(char)(seq2_len-winsize)(seq1_len-winsize));
cudaMalloc ((void *) &output_d, sizeof(int)(seq2_len-winsize)(seq1_len-winsize));
cudaMalloc ((void *) &output1_d, sizeof(int)(seq2_len-winsize)(seq1_len-winsize));

/* Copy data from host memory to device memory /
cudaMemcpy(seq1parts_d, seq1parts, sizeof(char*)(seq1_len-winsize), cudaMemcpyHostToDevice);
cudaMemcpy(seq2parts_d, seq2parts, sizeof(char**)(seq2_len-winsize), cudaMemcpyHostToDevice);

/* Compute the execution configuration */
int blocksize=10;
dim3 dimBlock(blocksize, blocksize);
dim3 dimGrid((seq1_len-winsize)/dimBlock.x, (seq2_len-winsize)/dimBlock.y);

/* Add arrays a and b, store result in c */
compare_arrays_gpu<<<dimGrid,dimBlock>>>(seq1parts_d, seq2parts_d, output_d,outer_d,outer1_d,help_d,(seq1_len-winsize),(seq2_len-winsize), thres);

/* Copy data from deveice memory to host memory /
cudaMemcpy(output, output_d, sizeof(int)(seq1_len-winsize)(seq2_len-winsize), cudaMemcpyDeviceToHost);
cudaMemcpy(outer, outer_d, sizeof(char**)(seq1_len-winsize)(seq2_len-winsize), cudaMemcpyDeviceToHost);
cudaMemcpy(outer1, outer1_d, sizeof(char**)(seq1_len-winsize)(seq2_len-winsize), cudaMemcpyDeviceToHost);

/* Print c /
for (i=0; i<(seq1_len-winsize)(seq2_len-winsize); i++)
printf(" succ[%d]=%d %s %s %c\n",i,output[i], outer[i], outer1[i], help[i]);

/* Free the memory */
free(seq1parts); free(seq2parts);free(output);
CUDA_SAFE_CALL(cudaFree(seq1parts_d)); CUDA_SAFE_CALL(cudaFree(seq2parts_d));CUDA_SAFE_CALL(cudaFree(output_d));

}

pedamallu · July 30, 2008, 9:28pm

I am sorry, for old version code. Here is the correct code.

#include “string.h”
#include “stdio.h”
#include “cutil.h”

global void add_arrays_gpu( char **in1, char **in2, int out, char **outt, char **outt1, int seq11, int seq22, int thr)
{
int idx=blockIdx.xblockDim.x+threadIdx.x;
int idy=blockIdx.yblockDim.y+threadIdx.y;
int index=idx+idyseq22;
if(idx < seq11 && idy < seq22){
outt[index]=in1[idx];
outt1[index]=in2[idy];
//out[index]=1;
if(outt[index]==outt1[index])
out[index]=1;
else
out[index]=0;
}
}
int main()
{
CUT_DEVICE_INIT();
/parameters/
int winsize=5;
int thres=2;

char *seq1;
char *seq2;

int i;
size_t seq1_len;
size_t seq2_len;

/* Allocate arrays a, b and c on host*/
seq1 = (char*) malloc(sizeof(char*));
seq2 = (char*) malloc(sizeof(char*));

seq1=“MNOPPPOPUOUOUIUITYRYFHFYTGHFEDTRTRT”;
seq1_len=strlen(seq1);

seq2=“MZMPSPOGUOUOUIUITYRYFHFYTGHFEDTRTRT”;
seq2_len=strlen(seq2);

/* pointers to host memory */
char **seq1parts;
char **seq2parts;
char **outer;
char **outer1;
int *output;
int *output1;

seq1parts = (char**) malloc((seq1_len-winsize)sizeof(char*));
seq2parts = (char**) malloc((seq2_len-winsize)sizeof(char*));
outer = (char**) malloc((seq2_len-winsize)(seq1_len-winsize)sizeof(char*));
outer1 = (char**) malloc((seq2_len-winsize)(seq1_len-winsize)sizeof(char*));
output = (int*) malloc((seq2_len-winsize)(seq1_len-winsize)sizeof(int));
output1 = (int) malloc((seq2_len-winsize)*(seq1_len-winsize)sizeof(int));

int k;
char temp1;
temp1=(char) malloc((seq1_len)sizeof(char));
temp1=seq1;
for(k=0; k<seq1_len-winsize; k++){
char* temp;
temp=(char*) malloc((winsize)sizeof(char));
strncpy(temp, temp1, winsize);
seq1parts[k]=temp;
printf(“%s\n”,seq1parts[k]);
char* xx=temp1+1;
temp1=xx;
}
temp1=(char*) malloc((seq2_len)sizeof(char));
temp1=seq2;
for(k=0; k<seq2_len-winsize; k++){
char* temp;
temp=(char*) malloc((winsize)sizeof(char));
strncpy(temp, temp1, winsize);
seq2parts[k]=temp;
printf(“%s\n”,seq2parts[k]);
char* xx=temp1+1;
temp1=xx;
}

/* pointers to device memory */
char **seq1parts_d;
char **seq2parts_d;
char **outer_d;
char **outer1_d;
int *output1_d;
int *output_d;

/* Allocate arrays a_d, b_d and c_d on device*/
cudaMalloc ((void ) &seq1parts_d, sizeof(char)(seq1_len-winsize));
cudaMalloc ((void ) &seq2parts_d, sizeof(char)(seq2_len-winsize));
cudaMalloc ((void ) &outer_d, sizeof(char)(seq2_len-winsize)(seq1_len-winsize));
cudaMalloc ((void ) &outer1_d, sizeof(char)(seq2_len-winsize)(seq1_len-winsize));
cudaMalloc ((void *) &output_d, sizeof(int)(seq2_len-winsize)(seq1_len-winsize));
cudaMalloc ((void *) &output1_d, sizeof(int)(seq2_len-winsize)(seq1_len-winsize));

/* Copy data from host memory to device memory /
cudaMemcpy(seq1parts_d, seq1parts, sizeof(char*)(seq1_len-winsize), cudaMemcpyHostToDevice);
cudaMemcpy(seq2parts_d, seq2parts, sizeof(char**)(seq2_len-winsize), cudaMemcpyHostToDevice);

/* Compute the execution configuration */
int blocksize=10;
dim3 dimBlock(blocksize, blocksize);
dim3 dimGrid((seq1_len-winsize)/dimBlock.x, (seq2_len-winsize)/dimBlock.y);

/* Add arrays a and b, store result in c */
add_arrays_gpu<<<dimGrid,dimBlock>>>(seq1parts_d, seq2parts_d, output_d,outer_d,outer1_d,(seq1_len-winsize),(seq2_len-winsize), thres);

/* Copy data from deveice memory to host memory /
cudaMemcpy(output, output_d, sizeof(int)(seq1_len-winsize)(seq2_len-winsize), cudaMemcpyDeviceToHost);
cudaMemcpy(outer, outer_d, sizeof(char**)(seq1_len-winsize)(seq2_len-winsize), cudaMemcpyDeviceToHost);
cudaMemcpy(outer1, outer1_d, sizeof(char**)(seq1_len-winsize)(seq2_len-winsize), cudaMemcpyDeviceToHost);

/* Print c /
for (i=0; i<(seq1_len-winsize)(seq2_len-winsize); i++)
printf(" succ[%d]=%d %s %s %c\n",i,output[i], outer[i], outer1[i] );

/* Free the memory */
free(seq1parts); free(seq2parts);free(output);
CUDA_SAFE_CALL(cudaFree(seq1parts_d)); CUDA_SAFE_CALL(cudaFree(seq2parts_d));CUDA_SAFE_CALL(cudaFree(output_d));

}

BenW · July 30, 2008, 11:02pm

Well, I didn’t read all the way through, but I did notice a couple of things:

You’re still not doing the compare on actual characters but on pointers to arrays in the kernel. outt[index]==outt1[index] checks whether the char* pointers are equal, not the strings; try explicitly comparing every character individually.

Also, AFAIK, variables that point to memory on the device (your /* pointers to device memory */ section) need to be declared with the device directive, and I believe they also need to be at file scope. See if that doesn’t clean up your compile warnings.

Ben

Reimar · July 31, 2008, 6:30am

That should not be necessary, but the code only copies the pointers to the device, not the data. Also, the pointers that are copied are pointers to host memory, how on earth is the device supposed to be able to use them?

Sorry to be so blunt, but IMO the best advice I can give is: start programming something simpler with CUDA until you have a better idea of the basic concepts.

Depending on your background reading up on pointers and pointers to pointers or drawing a schematic on a piece of paper of how your algorithm should work and which pointers point where (and indicating where each piece of data resides, on the CPU or the GPU memory) might help, too.