I want to perform the following command
accum += (float) trg[pos];
Where pos is calculate in each thread.
Is this going to cause corruption of accum?
I can get round it by storing all the values of accum in an array and adding it together back on the CPU but thats going to use a ton of memory that I dont think i’m going to have going spare.
On another note can i pass CPP structures to CUDA?
I have a structure l_dash and I need access to l_dash->height and l_dash->width for my GPU code.