global void testkernel(float *r) { int idx=thread.Idx; if (idx<100) r[idx]=foo((float)idx); }
Assume that kernel run is made with appropriate thread/block size and r points
to a 100 entries properly allocated array for both host/device cases.
So, do you think that the above definitions are ok? I am asking this, due to a problem
I am experiencing with my cpp integration code which gives me some inconsistent behaviour.
The functions you are invoking from kernel should be device type.
In your code you are trying to call host function from the kernel. This is not allowed.
Hi,
Yes your code seems to be ok… I didn’t noticed the preprocessor directives making it host/device function.
I have attached a solution using your code and it is working fine. Check with the kernel invocation and device to host copy. May be you will be having some issues in these areas. Or else the code looks ok. Sample1.zip (2.77 KB)