Hi ,
Is it possible to use the results of 1 kernel ,stored on the device memory, by another kernel?? or should we do the tedious method of copying the result to the host and transfering that back to the new kernel?
Thanks,
randal
Hi ,
Is it possible to use the results of 1 kernel ,stored on the device memory, by another kernel?? or should we do the tedious method of copying the result to the host and transfering that back to the new kernel?
Thanks,
randal
Of course you can. Just allocate a device output pointer and pass it to both kernels,
something like this:
__global__ void KernelA( float *pIn, float *pOut )
{
....
pOut[ threadIdx.x ] = <someValue>; // or pOut[ 0 ] = <SomeValue>
}
__global__void KernelB( float *pAnotherIn, float *pKernelAResult, float *pKernelBResult )
{
....
pKernelBResult[ threadIdx.x ] = <SomeValue> * pKernelAResult[ threadIdx.x ]; // or any other formula...
}
Then copy the pKernelBResult values to host using regular cudaMemcpy
eyal