Access the values of a float4 array Problem for understand and create a kernel

AuraHxC · March 14, 2010, 8:29pm

Hello everybody !!!

I’ve a float4 array => float4* octree of size const unsigned int octreeSize. My first problem is to create a kernel in order to access any point of the array.

My function launchKernel is declared like this :

int launchKernel( float4* octree, const unsigned int octreeSize )

{

   cudaSetDevice( 0 );

/* allocation mÃ©moire du rÃ©sultat du cÃ´tÃ© Host */

   float4* h_resultat = ( float4* ) malloc( sizeof( float4 ) );

/* allocation mÃ©moire GPU */

   float4* d_octree;

   const int size = octreeSize * sizeof( float4 );

   float4* d_resultat;

/* allocation mÃ©moire et copie de l'octree en mÃ©moire GPU */

   cudaMalloc( ( void** ) &d_octree, size );

   cudaMemcpy( d_octree, &octree, size, cudaMemcpyHostToDevice );

   /* allocation mÃ©moire pour le rÃ©sultat du cÃ´tÃ© Device */

   cudaMalloc( ( void** ) &d_resultat, sizeof( float4 ) );

 mykernel <<< , >>>( d_octree, d_resultat );

   cudaMemcpy( h_resultat, d_resultat, sizeof( float4 ), cudaMemcpyDeviceToHost );

cudaFree( d_octree );

   cudaFree( d_resultat );

   free( h_resultat );

return 0;

}

The red code is a temporary code : calls kernel and the copy of the result. ( float4 => just a node of the octree ).

How to access any boxes of the array ( octree ) from a kernel ?
What to put between the triple < ?
I’ve a another problem when I generate an octree with a depth value of 2, I have no errors when the program running but when I increase the value as 3, I receive a seg fault message : the segfault comes from this line

cudaMemcpy( d_octree, &octree, size, cudaMemcpyHostToDevice );

Thanks for your help.

kbam · March 15, 2010, 1:18am

Hello everybody !!!

I’ve a float4 array => float4* octree of size const unsigned int octreeSize. My first problem is to create a kernel in order to access any point of the array.

My function launchKernel is declared like this :
int launchKernel( float4* octree, const unsigned int octreeSize )

{

   cudaSetDevice( 0 );

/* allocation mÃ©moire du rÃ©sultat du cÃ´tÃ© Host */

   float4* h_resultat = ( float4* ) malloc( sizeof( float4 ) );

/* allocation mÃ©moire GPU */

   float4* d_octree;

   const int size = octreeSize * sizeof( float4 );

   float4* d_resultat;

/* allocation mÃ©moire et copie de l'octree en mÃ©moire GPU */

   cudaMalloc( ( void** ) &d_octree, size );

   cudaMemcpy( d_octree, &octree, size, cudaMemcpyHostToDevice );

   /* allocation mÃ©moire pour le rÃ©sultat du cÃ´tÃ© Device */

   cudaMalloc( ( void** ) &d_resultat, sizeof( float4 ) );

 mykernel <<< , >>>( d_octree, d_resultat );

   cudaMemcpy( h_resultat, d_resultat, sizeof( float4 ), cudaMemcpyDeviceToHost );

cudaFree( d_octree );

   cudaFree( d_resultat );

   free( h_resultat );

return 0;

}
The red code is a temporary code : calls kernel and the copy of the result. ( float4 => just a node of the octree ).

How to access any boxes of the array ( octree ) from a kernel ?

What to put between the triple < ?

I’ve a another problem when I generate an octree with a depth value of 2, I have no errors when the program running but when I increase the value as 3, I receive a seg fault message : the segfault comes from this line
cudaMemcpy( d_octree, &octree, size, cudaMemcpyHostToDevice );
Thanks for your help.

Hi,

answering question 2 first. Sorry leaving Q 3

The kind of application where it is great to use a GPU and cuda is where you have an operation that you want done on thousands or millions of cells in Parallel with one thread processing the data for each cell. (thats the simplest approach but there are many other things that can be done) So if you had 10000 octrees and are processing those with 10000 threads then you split the 10000 threads into blocks of say 32 threads and would want code like this

dim3 dimBlock( 32 ); 

dim3 dimGrid( (10000+32)/32 );

then when you call the kernel use

mykernel<<< dimGrid, dimBlock >>>(d_octree, d_resultat );

NB its usually more efficient to split it into blocks of 32 or a multiple of 32, but can be other numbers.

Question 1:

In above a 1D array of 10000 cells was allocated, to access just use

float4 a = h_resultant[cellNum]; 

// and the reverse

  h_resultant[ cellNum ] = a;

or float a = h_resultant[cellNum].x;  to access just the first of the floats in the float 4.  .x .y .z for 1st 3, .w for the 4th one

and h_resultant[ cellNum ].x = a;

Now if you do have 10000 octrees then the threads are split into all those blocks of 32 threads per block, you will want the following

int octreeNum = threadIdx.x + blockIdx.x * blockDim.x; // calculate the thread_number within the entire grid from the block number and thread number within its block

and then

int cellNum = octreeNum*MaxOctreeDepth + currentDepth; // so if you have 10000 octrees and MaxOctreeDepth is 10 then the 1st octree would take 1st 10 cells in d_octree,… and d_octree would be 100,000 cells

Hope this helps, I think its a little off track from what you are doing, sorry

pardon any typo errors I may have made

kbam

AuraHxC · March 15, 2010, 9:13pm

Thanks for your explications ;)

But I’ve always the problem : For exemple, How to access the 2nd case of my array.

This my sample of code :

#include <cuda.h>

#include <cuda_runtime.h>

#include <stdio.h>

#include <external_dependency.h>

__global__ void kernel_test( float4* octree, float4* d_resultat )

{

   int index = threadIdx.x + blockIdx.x * blockDim.x;

if ( index == 2 ) {

	  *d_resultat = *( octree + index );

   }

}

int launchKernel( float4* octree, const unsigned int octreeSize )

{

   /*for ( unsigned int i = 0; i < octreeSize; ++i )

	  printf( "x: %.2f, y: %.2f, z: %.2f, w: %.2f \n", octree[i].x, octree[i].y, octree[i].z, octree[i].w );*/

cudaSetDevice( 0 );

dim3 dimBlock( 32 );

   dim3 dimGrid( ( octreeSize + 32 ) / 32 );

/* allocation mÃ©moire du rÃ©sultat du cÃ´tÃ© Host */

   float4* h_resultat = ( float4* ) malloc( sizeof( float4 ) );

/* allocation mÃ©moire GPU */

   float4* d_octree;

   const int size = octreeSize * sizeof( float4 );

   float4* d_resultat;

/* allocation mÃ©moire et copie de l'octree en mÃ©moire GPU */

   cudaMalloc( ( void** ) &d_octree, size );

   cudaMemcpy( d_octree, &octree, size, cudaMemcpyHostToDevice );

   /* allocation mÃ©moire pour le rÃ©sultat du cÃ´tÃ© Device */

   cudaMalloc( ( void** ) &d_resultat, sizeof( float4 ) );

kernel_test<<<dimGrid, dimBlock>>>( d_octree, d_resultat );

   cudaMemcpy( h_resultat, d_resultat, sizeof( float4 ), cudaMemcpyDeviceToHost );

printf("x: %f, y: %f, z: %f, w: %f \n", h_resultat->x, h_resultat->y, h_resultat->z, h_resultat->w );

cudaFree( d_octree );

   cudaFree( d_resultat );

   free( h_resultat );

return 0;

}

And I’ve always the problem about the seg fault when I increase the depth of the octree ( when depth is greater than 2 ).

AuraHxC · March 15, 2010, 9:52pm

Finally, my program works !!!
In fact, I just saw that cudaMemcpy( d_octree, &octree, size, cudaMemcpyHostToDevice ); is false => I just changed &octree to octree.

Thanks !!!

Topic		Replies	Views
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16223	November 22, 2014
Unable to correctly use a 2D CUDA array with a texture object CUDA Programming and Performance cuda	2	340	February 25, 2024
Is it possible to process multidimensional arrays inside the kernel? CUDA Programming and Performance	13	9053	March 31, 2015
what's wrong with my code? CUDA Programming and Performance	6	3962	July 7, 2009
a problem complex array add with cuda ????? CUDA Programming and Performance	2	988	August 17, 2017
3D arrays - where to start? CUDA Programming and Performance	12	20378	October 13, 2013
2d array testing in very simple code using CUDA CUDA Programming and Performance	29	30415	November 15, 2010
I can't not get true answer at 3D array calculation CUDA Programming and Performance	12	1352	January 13, 2017
Help with strange error CUDA Programming and Performance	8	2107	February 25, 2010
Problem to access a variable copied to the GPU memory CUDA Programming and Performance	2	2543	March 25, 2010

Access the values of a float4 array Problem for understand and create a kernel

Related topics