Hi,guys.
In the following code,I want to implement a search function .Searching a keyword in the given a string.
Assumed that the length of the string Tis 518,and the kewword is 6,launch the kernel with 512 thread and only one block(just a experiment).so one thread search a fix location T[tid],I thought it might bring a speedup searching performance,but failed.Someone can give me some advice on this?Thx in advance.
__global__ void
testKernel( char* d_T, char* d_P,int *d_Dist,int *d_flag)
{
__shared__ char T[518];
__shared__ int flag;
__shared__ char P[6];
int m=6;
const unsigned int bid=blockIdx.x;
const unsigned int tid=threadIdx.x;
if(tid==0)
{
flag=0;
for(int i=512;i<=516;i++)
T[i]=d_T[i];
}
if(tid<m)
P[tid]=d_P[tid];
T[tid]=d_T[tid];//the above code tranfer data from gmem to shmem.
__syncthreads();
int i=0;
for(;i<m&&P[i]==T[tid+i];i++);
if(i==m)
flag=1;
__syncthreads();
if(tid==0)
if(flag!=0)
*d_flag=flag;
}
edit:
1:Is it necessary to load data to shmem?
2:Is it coalesced?