hey pple
I am trying to optimize the performance of a CUDA kernel by porting some of the data presently residing in the Global memory to the Texture memory.
I have been successful in creating a CUDA array and binding it to a texture reference and compiling the resultant code.
But when I try to execute the kernel, the execution hangs on any cudaMemcpy() function invocation.Whats surprising is that this cudaMemcpy() is not a part of the texture reference.
Also the use of CUDA_SAFE_CALL() dint help in zeroing down the issue.
I was able to gather that the execution hangs at the cudaMemcpy() invocation with the aid of the fprintfs.
below is the piece of code:
extern “C” {
int INIT ( )
{
fprintf(stderr,“I am inside INIT in the host code”);
float x = 0.0, *x_d ;
int s, e ;
cudaMalloc((void **)&x_d,sizeof(float)) ;
fprintf(stderr,“I am after cudaMalloc”);
cudaMemcpy(x_d,&x,sizeof(float),cudaMemcpyHostToDevice) ;
fprintf(stderr,“I am after cudaMemcpy”);
cudaFree(x_d) ;
fprintf(stderr,“I am after cudaFree”);
return(0) ;
}
}
I would really appreciate any valuable inputs in this regard
Padma