Optimu way for this code?

Hi ,

Here I have written sample of My code. Its taking 240+ms. this much time is not fair for me, so

1) Could you please tell Me what could be the optimum way for this following program?

I want to execute the device function with using thredas concept ( I mean individually )

     2) How can I execute device function using threds?
     3) What could be the kernal function configuration?

//device function
device void DevFun()
{
// loop should execute 640 times…
for(int j=0; j<640; ++j ) { … some code … };
}

//kernal function
global void KerFun()
{
… some code …
… some code …

//call to device function
DevFun()

… some code …
… some code …
}

//main function
main()
{
… some code …
… some code …

//call to kernal function
KerFun<<<1,1,0>>>()

… some code …
… some code …
}

Thanks
Manjunath

Run the 640 iterations in parallel instead of sequentially, by creating 640 threads (20 blocks of 32 threads per block):

__global__ void KerFun() {

  int j = blockIdx.x * blockDim.x + threadIdx.x;

// do here what you would do within the loop.

}

main() {

  // set up for kernel execution

  KerFun<<<20,32,0>>>();

}

Hi Jamie,

Thanks for your reply.

I have one doubt here…, that is

When we declare kernal function configuration like this: KerFun<<<20,32,0>>>();

It means that the KerFun() is executed 20*32 times. am I right??

but I dont want execute the hole function that many times[ it takes too much time ].

I want to execute DevFun() 20*32 times [ DevFun() only ], which is calling from Kernal function. How can I do this?

Thanks

Manjunath

It’s not possible to spawn threads from within kernel code, so from one invocation of KerFun you can’t launch multiple threads for the 640 executions DevFun.

If KerFun needs to be run only once, you could perform the processing on the host and pass the information, or invoke it as a separate kernel:

__global__ void DevFun() {

  int j = blockIdx.x * blockDim.x + threadIdx.x;

// do here what you would do within the loop.

}

__global__ void KerFun() {

  // do here what you want to do only once.

}

main() {

  // set up for kernel execution

  KerFun<<<1,1,0>>>();

  DevFun<<<20,32,0>>>();

}