I’ve a float4 array => float4* octree of size const unsigned int octreeSize. My first problem is to create a kernel in order to access any point of the array.
The red code is a temporary code : calls kernel and the copy of the result. ( float4 => just a node of the octree ).
How to access any boxes of the array ( octree ) from a kernel ?
What to put between the triple < ?
I’ve a another problem when I generate an octree with a depth value of 2, I have no errors when the program running but when I increase the value as 3, I receive a seg fault message : the segfault comes from this line
The kind of application where it is great to use a GPU and cuda is where you have an operation that you want done on thousands or millions of cells in Parallel with one thread processing the data for each cell. (thats the simplest approach but there are many other things that can be done) So if you had 10000 octrees and are processing those with 10000 threads then you split the 10000 threads into blocks of say 32 threads and would want code like this
NB its usually more efficient to split it into blocks of 32 or a multiple of 32, but can be other numbers.
Question 1:
In above a 1D array of 10000 cells was allocated, to access just use
float4 a = h_resultant[cellNum];
// and the reverse
h_resultant[ cellNum ] = a;
or float a = h_resultant[cellNum].x; to access just the first of the floats in the float 4. .x .y .z for 1st 3, .w for the 4th one
and h_resultant[ cellNum ].x = a;
Now if you do have 10000 octrees then the threads are split into all those blocks of 32 threads per block, you will want the following
int octreeNum = threadIdx.x + blockIdx.x * blockDim.x; // calculate the thread_number within the entire grid from the block number and thread number within its block
and then
int cellNum = octreeNum*MaxOctreeDepth + currentDepth; // so if you have 10000 octrees and MaxOctreeDepth is 10 then the 1st octree would take 1st 10 cells in d_octree,… and d_octree would be 100,000 cells
Hope this helps, I think its a little off track from what you are doing, sorry
Finally, my program works !!!
In fact, I just saw that cudaMemcpy( d_octree, &octree, size, cudaMemcpyHostToDevice ); is false => I just changed &octree to octree.