Struct In Cuda


I need to program in CUDA a neural network.

The code of this network have a lot of structs.

I want to know, if I can use this structs or I need convert this structs in arrays.

I tried search in forum about your support, and not found any talking about CUDA 4.2.

Thanks for attention.


could you please reply with a post of your structs?



The struct is:

typedef int INT;

typedef double REAL;

typedef struct

{ /* A LAYER OF A NET:                     */

    INT Units; /* - number of units in this layer       */

    REAL* Output; /* - output of ith unit                  */

    REAL* Error; /* - error term of ith unit              */

    REAL** Weight; /* - connection weights to ith unit      */

    REAL** WeightSave; /* - saved weights for stopped training  */

    REAL** dWeight; /* - last weight deltas for momentum     */


typedef struct

{ /* A NET:                                */

    LAYER** Layer; /* - layers of this net                  */

    LAYER* InputLayer; /* - input layer                         */

    LAYER* OutputLayer; /* - output layer                        */

    REAL Alpha; /* - momentum factor                     */

    REAL Eta; /* - learning rate                       */

    REAL Gain; /* - gain of sigmoid function            */

    REAL Error; /* - total net error                     */

} NET;

If i use C++ class, have any problem to use arrays and methods getters/setters?

Cuda supports pointers and function/procedure calls.

The cpu has it’s own memory so own pointers. (Let’s call these “local pointers, local to the cpu/main ram”)

The gpu has it’s own memory so own pointers. (Let’s call these “remote pointers from the perspective of the cpu” )

Integers and floating points are the same format so they can simply be copied back and forth and do not need any modifications or any special treatment.

The cpu program will have to allocate the memory on the gpu (remote allocation).

The cpu program will have to keep track of the pointers returned by this remote allocation (remote pointers).

The cpu program will have to send the remote pointers towards the kernel, this can either be done by kernel parameters (limited and more difficult), or by a single kernel parameter which is a remote pointer to a remote piece of memory containing all the remote pointers (easier/unlimited/indepedent).

The cpu program should also allocate this same structure locally to match the remote structure. (Structure could be called: “KernelParameters”, keep it as simple as possible a direct copy of everything needed.)

The cpu program must then first initialize the local memory (kernel parameters) with the remote pointers.

Then the cpu program copies this local memory to the remote memory, thus initializing the remote memory with the remote pointers from this local memory.

The cpu program should also copy any other memories which are necessary.

The cpu program can then invoke the kernel.

The kernel then receives a pointer which points to it’s own memory which has already been initialized by the cpu.

The kernel can then use this pointer and memory to initialize any structures like c structures or c++ structures which contain pointers/arrays etc. So simple assignments will do, example:

global void Kernel( TKernelParameters* KernelParameters ) // using a remote pointer has adventage that memory can be copied back from gpu to cpu as well.

MyStructure.MyPointer = KernelParameters->MyStructure.MyPointer;
OtherStructure.OtherPointer = KernelParameters->OtherStructure.OtherPointer;

This will then initialize all the pointers on the kernel side.

Now the kernel is setup/ready to be used. The kernel can now run and simply access all it’s pointers as if it were arrays since this is what the C language allows via the index operator, example:
MyStructure.MyPointer[ MyIndex ] = SomeValue;

Doing remote allocations and remote freeing has the adventage that it works well. Cuda also has support for malloc and free inside kernels but is currently buggy. Also transferring data from cpu and gpu might still be required even when using malloc/free inside cuda itself… so might as well do all on cpu side or so ;). This also has adventage that kernel stays relatively simple…

After the cpu is done with the kernel it should clean up the remote memory and possibly local memory as well if it is to terminate or so.

This should give some somewhat vague idea of what needs to be done ! ;) :)