Passing data to and from kernel.

There are a couple of concepts in volved:

  1. Cuda pointers.
  2. Host pointers.

These can also be combined as follows:

  1. Cuda pointer to cuda pointer.
  2. Host pointer to host pointer.
  3. Cuda pointer to host pointer.
  4. Host pointer to cuda pointer.

(pointer to pointer)

Some questions:

  1. What kind of pointer does cuMemAlloc expect/return ? type 1,2,3,4,5 or 6 ?

  2. The kernel parameter array ? What kind of array is that ? Seems to deal with multiple pointers and multiple types as well ?

Highly confusing.

I need abstract documentation something like:

kernel parameter array on host concepts out of:

host pointers to cuda pointers.

Or even better some ascii art for example:

host pointer array ?
|… | … | … |

^ what are these elements ? integers for cuda kernel ?

So far it appears to be pointers to what exactly ? host or cuda pointers ? both somewhat seem to work.

More confusion…

Example kernel which is falling to work properly, free memory not working:

extern “C”
{ // extern c begin

global void Kernel( int ParaIn, int *ParaOut )
{
*ParaOut = ParaIn;
}

} // extern c en

could also be:

extern “C”
{ // extern c begin

global void Kernel( int ParaIn, int *ParaOut )
{
ParaOut[0] = ParaIn;
}

} // extern c en

just one integer in ParaOut.

Ok,

I see I had little bug in my code… it was returning false while the result was ok…

code was:

if

which needed to be:

if not

But some more documentation would still help…

Now I can go back to my trail and error runs ;)

I seem to have figured it out and it goes like this:

There is an inconsistency in the way the parameters are passed to kernels.

The inconsistency is this:

  1. input integers can simply be passed as host memory.

  2. output integers must be passed as cuda memory.

^ Big inconsistency.

It would have been better if input integers must also be cuda memory.

Example:

ParameterCount := 2;
Parameter[0] := vParameterIn.Address; // input integer parameter must be passed as host pointer to host memory.
Parameter[1] := @vParameterOut.Handle; // output integer parameter must be passed as a host pointer to cuda memory pointer.

Address returns host address of host memory.
Handle returns cuda memory pointer.

Now I am still having problems with multiple parameters and arrays, so moving on to next somewhat larger example…

Array Kernel example:

extern “C”

{ // extern c begin

// para4 is array of 3 integers

// para5 is array of 4 integers

// return some values in them

global void Kernel( int Para1, int Para2, int Para3, int *Para4, int *Para5 )

{

Para4[0] = 111;

Para4[1] = 222;	

Para4[2] = 333;	

Para5[0] = Para1;

Para5[1] = Para2;	

Para5[2] = Para3;	

Para5[3] = 666;	

}

} // extern c end

extern "C" 

{ // extern c begin

// para4 is array of 3 integers

// para5 is array of 4 integers

// return some values in them

__global__ void Kernel( int Para1, int Para2, int Para3, int *Para4, int *Para5 )

{

	Para4[0] = 111;

	Para4[1] = 222;	

	Para4[2] = 333;	

	Para5[0] = Para1;

	Para5[1] = Para2;	

	Para5[2] = Para3;	

	Para5[3] = 666;	

}

} // extern c end

Using the same technique as above now doesn’t work… I wonder why ?!?

Ok, I spotted the problem.

The size parameter to the devic to host copy function was zero, little programming mistake in calculating the size somewhere… it wasn’t being assigned/stored.

These kinds of programming mistakes are hard to spot !

Glad I found it !

Now everything is working with above techniques ! ;)

I was already thinking about giving up on cuda and trying opencl… I’m glad it’s working now with cuda ! ;) =D