Dear Experts,
I am quite new to CUDA and I am trying to implement a simple Moving Average FIR Filter. In order to verify the correctness of my code, I have designed two versions of the fir filter; one runs on Host and the other on GPU. When I run my code i get the following output:
Host Output: [ 0.000000 0.000000 0.000000 0.000000 0.333000 0.666000 0.999000 0.999000 0.999000 0.666000 0.333000 0.000000 0.000000 0.000000 0.000000]
Gpu Output: [ nan nan nan 0.000000 0.333000 0.666000 0.999000 0.999000 0.999000 0.666000 0.333000 0.000000 0.000000 0.000000 0.000000]
But when I compile my code in device emulation mode, I get the following output:
Host Output: [ 0.000000 0.000000 0.000000 0.000000 0.333000 0.666000 0.999000 0.999000 0.999000 0.666000 0.333000 0.000000 0.000000 0.000000 0.000000]
Gpu Output: [ 0.000000 0.000000 0.000000 0.000000 0.333000 0.666000 0.999000 0.999000 0.999000 0.666000 0.333000 0.000000 0.000000 0.000000 0.000000]
Which is exactly same as the host code. My questions are as follows:
1- why in the first case I receive nan outputs ? I mean what can be a reason for this.
2- The statement int idx = blockIdx.x * blockDim.x + threadIdx.x; can be used to traverse the array data structure e.g. data. If I want to access data[0] + data[1] + data[2] , can I just use data[idx] + data[idx+1] + data[idx+2] ?
Regards,
Sanwar