Dynamic memory framework

After dealing with Cuda for a few months I’ve arrived at a conclusion that Cuda memory issues are the root of all evil. Creating more or less real-world program requires large amount of variables as a result large amount of registers. For example an average cycle requires an indexer, a bound, and a pointer. This is only minimal set of required variables, though you might also need to do some normalizations, amplifications, extractions, reductions, and all these operations require a register. Taking to account the fact that you might need 3-4 nested loops you get this empty looping program that consumes 4 * 3 = 12 registers. 12 registers for doing nothing, great! Don’t you think?

After doing some optimizations I’ve found out that not all of the values can be kept in registers. For the average cycle, bound and array pointer don’t change that often, usually these values don’t change until kernel terminates, so they can successfully be referred as the constants. For constants Cuda has separate memory space, i.e. Constant Memory Space. It has 8 kilobytes of cache per SM (Streaming Multiprocessor) so first time you access your constant you get 400x penalty but next time you get your constant with zero overhead.

After playing little bit with constants I’ve found out that it’s real hard to manage dozens of constants. When you have 50 constants doing cudaMemcpyToSymbol is a real pain for each of them. So after playing little bit more I’ve ended with the following little framework that helps managing

  1. Global memory space

  2. Constant memory space

As a result you get a clean C++ code like this:

(well the convention is closer to C# but it’s C++ code):

In this example GeneralMatrix is used, it’s a bit slower than using specialized elements.

[codebox]

RandomSingle element = RandomSingle(0, 3.1415F);

GeneralMatrix m = GeneralMatrix(4, 4, (Object*)&element);

srand(date);

GlobalsBuffer globals = m.GetGlobals();

ConstantsBuffer constants = m.GetConstants();

UInt32 index = 0;

Single globalsPtr = (Single)globals.Pointer;

for (UInt32 i = 0; i < m.Rows; i++)

{

for (UInt32 j = 1; j < m.Cols; j++)

{

	printf("%f, ", globalsPtr[index++]);

}



printf("%f\r\n", globalsPtr[index++]);

}

printf("---------------------------------------------------------\r\n");

[/codebox]

In this example RandomSingleMatrix is used. It is more optimized version, and will work faster.

[codebox]

RandomSingleMatrix m = RandomSingleMatrix(4, 4, 0.0F, 3.1415F);

srand(date);

GlobalsBuffer globals = GlobalsBuffer(m.GetGlobalsSize());

ConstantsBuffer constants = ConstantsBuffer(m.GetConstantsSize());

m.MapGlobals(&globals);

m.MapConstants(&constants);

UInt32 index = 0;

Single globalsPtr = (Single)globals.Pointer;

for (UInt32 i = 0; i < m.Rows; i++)

{

for (UInt32 j = 1; j < m.Cols; j++)

{

	printf("%f, ", globalsPtr[index++]);

}



printf("%f\r\n", globalsPtr[index++]);

}

printf("---------------------------------------------------------\r\n");

[/codebox]

The following code is part of the program that i’ve didn’t finished yet, but it looks nice, works fast, and is easy to develop and understand

[codebox]

RandomSingleMatrix input = RandomSingleMatrix(inputRows, inputCols, rangeA, rangeB);

ZeroSingleExLayer phenotypeLayer = ZeroSingleExLayer(inputRows, inputCols, outputRows, outputCols, Globals);

RandomUInt32ExLayer genotypeLayer = RandomUInt32ExLayer(inputRows, inputCols, outputRows, outputCols, None);

GeneralVector genotype = GeneralVector(layers, &genotypeLayer);

GeneralVector phenotype = GeneralVector(layers, &phenotypeLayer);

GeneralVector inputs = GeneralVector(populationSize, &input);

GeneralVector genotypes = GeneralVector(populationSize, &genotype);

GeneralVector phenotypes = GeneralVector(populationSize, &phenotype);

GeneticAlgorithmConfiguration gac = GeneticAlgorithmConfiguration

(

steps,

populationSize,

crossoverProbabililty,

mutationProbabililty,

inversionProbabililty,

rangeA,

rangeB

);

ConstantsBuffer gacConstants = gac.GetConstants();

GlobalsBuffer inputGlobals = inputs.GetGlobals(); // This statement renders input globals

GlobalsBuffer genotypeGlobals = genotypes.GetGlobals(); // This statement renders genotype globals

GlobalsBuffer phenotypeGlobals = phenotypes.GetGlobals(); // This statement renders phenotype globals

LaunchKernel(gacConstants.Pointer, inputGlobals.Pointer, genotypeGlobals.Pointer, phenotypeGlobals.Pointer);

[/codebox]

And the most interesting part the Framework:

(Constructive criticism is highly appreciated. If it helps you earning one or two million bucks, send me some. Legal: I take no responsibility of what this code might do)

“CommonTypes.h”

[codebox]

#ifndef COMMONTYPES

#define COMMONTYPES

typedef int Int32;

typedef short Int16;

typedef float Single;

typedef double Double;

typedef signed char SByte;

typedef unsigned char Byte;

typedef unsigned int UInt32;

typedef unsigned short UInt16;

#endif

[/codebox]

“HostTypes.h”

[codebox]

#ifndef HOSTTYPES

#define HOSTTYPES

#include “CommonTypes.h”

enum Space

{

None = 0,

Globals = 1,

Constants = 2,

Mixed = 3

};

class Buffer

{

protected:

Buffer() : Size(0), Pointer(NULL)

{

	Offset = 0;

}

Buffer(UInt32 size) : Size(size), Pointer(new Byte)

{

	Offset = 0;

}

public:

UInt32 Offset;

Byte *Pointer;

UInt32 Size;

bool IsEmpty()

{

	return Size == 0 || Pointer == NULL;

}



template <typename T> void Add(T value)

{

	// When buffer not specified, we only count elements

	if (Pointer != NULL)

	{

		// Before writing we should check for buffer overflow

		// TODO: add #ifndef NO_BUFFER_OVERFLOW_CHECK

		if (Offset + sizeof(T) <= Size)

		{

			*((T*)(Pointer + Offset)) = value;

		}

		else

		{

			throw "Buffer overflow in Buffer::Add(T value)";

		}

	}

	Offset += sizeof(T);

}

template <typename T> T* Allocate(UInt32 count)

{

	T* result = (T*)(Pointer + Offset);

	Offset += count * sizeof(T);

	if (Offset > Size)

	{

		Offset -= count * sizeof(T);

		throw "Buffer overflow in T* Buffer::Allocate<T>(UInt32 count)";

	}

	return result;

}

template <typename T> void Account(UInt32 count)

{

	Offset += count * sizeof(T);

}

void AccountOffset(UInt32 offset)

{

	Offset += offset;

}

virtual ~Buffer()

{

	if (Pointer != NULL) delete [] Pointer;

}

};

class GlobalsBuffer : public Buffer

{

public:

GlobalsBuffer() : Buffer()

{

}

GlobalsBuffer(UInt32 size) : Buffer(size)

{

}

};

class ConstantsBuffer : public Buffer

{

public:

ConstantsBuffer() : Buffer()

{

}

ConstantsBuffer(UInt32 size) : Buffer(size)

{

}

};

class Object

{

public:

virtual UInt32 GetGlobalsSize() const

{

	GlobalsBuffer buffer = GlobalsBuffer();

	MapGlobals(&buffer);

	return buffer.Offset;

}

virtual UInt32 GetConstantsSize() const

{

	ConstantsBuffer buffer = ConstantsBuffer();

	MapConstants(&buffer);

	return buffer.Offset;

}

virtual void MapGlobals(GlobalsBuffer *buffer) const = 0;

virtual void MapConstants(ConstantsBuffer *buffer) const = 0;

virtual ConstantsBuffer GetConstants() const

{

	return GetConstants(GetConstantsSize());

}

virtual ConstantsBuffer GetConstants(UInt32 size) const

{

	ConstantsBuffer result = ConstantsBuffer(size);

	MapConstants(&result);

	return result;

}

virtual ConstantsBuffer GetConstants(ConstantsBuffer &buffer) const

{

	MapConstants(&buffer);

	

	return buffer;

}

virtual GlobalsBuffer GetGlobals() const

{

	return GetGlobals(GetGlobalsSize());

}

virtual GlobalsBuffer GetGlobals(UInt32 size) const

{

	GlobalsBuffer result = GlobalsBuffer(size);

	MapGlobals(&result);

	return result;

}

virtual GlobalsBuffer GetGlobals(GlobalsBuffer &buffer)

{

	MapGlobals(&buffer);

	return buffer;

}

};

// MixedObject objects can affect both constant and global memory spaces depending on the value of field ‘ObjectSpace’.

class MixedObject : public Object

{

protected:

virtual void MapObjectGlobals(GlobalsBuffer *buffer) const = 0;

virtual void MapObjectConstants(ConstantsBuffer *buffer) const = 0;

public:

Space ObjectSpace;

MixedObject(Space objectSpace = Mixed) : ObjectSpace(objectSpace)

{

}

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	if (ObjectSpace & Globals) MapObjectGlobals(buffer);

}

virtual void MapConstants(ConstantsBuffer *buffer) const override

{

	if (ObjectSpace & Constants) MapObjectConstants(buffer);

}

};

// GlobalObject lies in global memory space and therefore not affects constant memory space

class GlobalObject : public Object

{

private:

virtual void MapConstants(ConstantsBuffer *buffer) const override

{

	// This is empty method, no changes are made to the buffer

}

virtual UInt32 GetConstantsSize() const override

{

	return 0;

}

};

// ConstantObject lies in constant memory space and therefore not affects global memory space

class ConstantObject : public Object

{

private:

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	// This is empty method, no changes are made to the buffer

}

virtual UInt32 GetGlobalsSize() const override

{

	return 0;

}

};

class UInt32Object : public GlobalObject

{

public:

virtual UInt32 GetGlobalsSize() const override

{

	return sizeof(UInt32);

}

};

class SingleObject : public GlobalObject

{

public:

virtual UInt32 GetGlobalsSize() const override

{

	return sizeof(Single);

}

};

class ZeroUInt32 : public UInt32Object

{

public:

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	buffer->Add<UInt32>(0);

}

};

class ZeroSingle : public SingleObject

{

public:

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	buffer->Add<Single>(0);

}

};

class RandomUInt32 : public UInt32Object

{

public:

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	buffer->Add(rand() + (rand() << 16));

}

};

class RandomSingle : public SingleObject

{

public:

Single GrayRangeA;

Single RangeDeltaNormalized;

RandomSingle() : GrayRangeA(0), RangeDeltaNormalized(1.0F / UINT_MAX)

{

}

RandomSingle(Single rangeA, Single rangeDelta) : GrayRangeA(rangeA), RangeDeltaNormalized(rangeDelta / UINT_MAX)

{

}

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	buffer->Add(GrayRangeA + (rand() + (rand() << 16)) * RangeDeltaNormalized);

}

};

class Container : public MixedObject

{

private:

virtual void MapObjectGlobals(GlobalsBuffer *buffer) const override

{

	// Containers can't have any globals, only their children can.

}

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	buffer->Add(Length);

	buffer->Add(GetGlobalsSize());

}

public:

UInt32 Length;

Space ChildSpace;

Container(UInt32 length, Space objectSpace = Constants, Space childSpace = Mixed) :

	Length(length), ChildSpace(childSpace), MixedObject(objectSpace)

{

}

virtual void MapConstants(ConstantsBuffer *buffer) const override

{

	if (ObjectSpace & Constants) MapObjectConstants(buffer);

}

};

// MixedObjectContainer is the object that contains mixed child elements.

class MixedObjectContainer : public Container

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const = 0;

virtual void MapChildConstants(ConstantsBuffer *buffer) const = 0;

public:

MixedObjectContainer(UInt32 length, Space objectSpace = Constants, Space childSpace = Mixed) : Container(length, objectSpace, childSpace)

{

}

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	if (ChildSpace & Globals) MapChildGlobals(buffer);

}

virtual void MapConstants(ConstantsBuffer *buffer) const override

{

	Container::MapConstants(buffer);

	if (ChildSpace & Constants) MapChildConstants(buffer);

}

};

class GlobalObjectContainer : public Container

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const = 0;

public:

GlobalObjectContainer(UInt32 length, Space objectSpace = Constants, Space childSpace = Globals) :

	Container(length, objectSpace, childSpace)

{

}

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	if (ChildSpace & Globals) MapChildGlobals(buffer);

}

};

class ConstantObjectContainer : public Container

{

protected:

virtual void MapChildConstants(ConstantsBuffer *buffer) const = 0;

public:

ConstantObjectContainer(UInt32 length, Space objectSpace = Constants, Space childSpace = Constants) :

	Container(length, objectSpace, childSpace)

{

}

virtual void MapGlobals(GlobalsBuffer *buffer) const override

{

	// No globals 

}

virtual void MapConstants(ConstantsBuffer *buffer) const override

{

	Container::MapConstants(buffer);

	if (ChildSpace & Constants) MapChildConstants(buffer);

}

};

class UInt32Vector : public GlobalObjectContainer

{

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	GlobalObjectContainer::MapObjectConstants(buffer);

	buffer->Add(Length * sizeof(UInt32));

}

public:

UInt32Vector(UInt32 length, Space objectSpace = Constants, Space childSpace = Globals) :

	GlobalObjectContainer(length, objectSpace, childSpace)

{

}

virtual UInt32 GetGlobalsSize() const override

{

	return Length * sizeof(UInt32);

}

};

class SingleVector : public GlobalObjectContainer

{

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	GlobalObjectContainer::MapObjectConstants(buffer);

	buffer->Add(Length * sizeof(Single));

}

public:

SingleVector(UInt32 length, Space objectSpace = Constants, Space childSpace = Globals) :

	GlobalObjectContainer(length, objectSpace, childSpace)

{

}

virtual UInt32 GetGlobalsSize() const override

{

	return Length * sizeof(Single);

}

};

class ZeroUInt32Vector : public UInt32Vector

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const override

{

	if (buffer->IsEmpty())

	{

		buffer->Account<UInt32>(Length);

	}

	else

	{

		UInt32 length = Length;

		UInt32 *pointer = buffer->Allocate<UInt32>(length);

		for (UInt32 i = 0; i < length; i++) pointer[i] = 0;

	}

}

public:

ZeroUInt32Vector(UInt32 length, Space objectSpace = Constants, Space childSpace = Globals) :

	UInt32Vector(length, objectSpace, childSpace)

{

}

};

class ZeroSingleVector : public SingleVector

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const override

{

	if (buffer->IsEmpty())

	{

		buffer->Account<Single>(Length);

	}

	else

	{

		UInt32 length = Length;

		Single *pointer = buffer->Allocate<Single>(length);

		for (UInt32 i = 0; i < length; i++) pointer[i] = 0.0F;

	}

}

public:

ZeroSingleVector(UInt32 length, Space objectSpace = Constants, Space childSpace = Globals) :

	SingleVector(length, objectSpace, childSpace)

{

}

};

class RandomUInt32Vector : public UInt32Vector

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const override

{

	if (buffer->IsEmpty())

	{

		buffer->Account<UInt32>(Length);

	}

	else

	{

		UInt32 length = Length;

		UInt32 *pointer = buffer->Allocate<UInt32>(length);

		for (UInt32 i = 0; i < length; i++) pointer[i] = rand() + (rand() << 16);

	}

}

public:

RandomUInt32Vector(UInt32 length, Space objectSpace = Constants, Space childSpace = Globals) :

	UInt32Vector(length, objectSpace, childSpace)

{

}

};

class RandomSingleVector : public SingleVector

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const override

{

	if (buffer->IsEmpty())

	{

		buffer->Account<Single>(Length);

	}

	else

	{

		UInt32 length = Length;

		Single *pointer = buffer->Allocate<Single>(length);

		Single rangeA = GrayRangeA;

		Single rangeDelta = RangeDeltaNormalized;

		for (UInt32 i = 0; i < length; i++) pointer[i] = rangeA + (rand() + (rand() << 16)) * rangeDelta;

	}

}

public:

Single GrayRangeA;

Single RangeDeltaNormalized;

RandomSingleVector(UInt32 length, Single rangeA, Single rangeDelta, Space objectSpace = Constants, Space childSpace = Globals) :

	GrayRangeA(rangeA), RangeDeltaNormalized(rangeDelta / UINT_MAX), SingleVector(length, objectSpace, childSpace)

{

}

};

class ZeroUInt32Matrix : public ZeroUInt32Vector

{

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	ZeroUInt32Vector::MapObjectConstants(buffer);

	buffer->Add(Rows);

	buffer->Add(Cols);

}

public:

UInt32 Rows;

UInt32 Cols;

ZeroUInt32Matrix(UInt32 rows, UInt32 cols, Space objectSpace = Constants, Space childSpace = Globals) :

	Rows(rows), Cols(cols), ZeroUInt32Vector(rows * cols, objectSpace, childSpace)

{

}

};

class ZeroSingleMatrix : public ZeroSingleVector

{

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	ZeroSingleVector::MapObjectConstants(buffer);

	buffer->Add(Rows);

	buffer->Add(Cols);

}

public:

UInt32 Rows;

UInt32 Cols;

ZeroSingleMatrix(UInt32 rows, UInt32 cols, Space objectSpace = Constants, Space childSpace = Globals) :

	Rows(rows), Cols(cols), ZeroSingleVector(rows * cols, objectSpace, childSpace)

{

}

};

class RandomUInt32Matrix : public RandomUInt32Vector

{

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	UInt32Vector::MapObjectConstants(buffer);

	buffer->Add(Rows);

	buffer->Add(Cols);

}

public:

UInt32 Rows;

UInt32 Cols;

RandomUInt32Matrix(UInt32 rows, UInt32 cols, Space objectSpace = Constants, Space childSpace = Globals) :

	Rows(rows), Cols(cols), RandomUInt32Vector(rows * cols, objectSpace, childSpace)

{

}

};

class RandomSingleMatrix : public RandomSingleVector

{

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	RandomSingleVector::MapObjectConstants(buffer);

	buffer->Add(Rows);

	buffer->Add(Cols);

}

public:

UInt32 Rows;

UInt32 Cols;

RandomSingleMatrix(UInt32 rows, UInt32 cols, Single rangeA, Single rangeDelta, Space objectSpace = Constants, Space childSpace = Globals) :

	Rows(rows), Cols(cols), RandomSingleVector(rows * cols, rangeA, rangeDelta, objectSpace, childSpace)

{

}

};

class GeneralCollection : public MixedObjectContainer

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const override

{

	if (buffer->IsEmpty())

	{

		buffer->AccountOffset(GetGlobalsSize());

	}

	else

	{

		for (UInt32 i = 0; i < Length; i++) Elements[i]->MapGlobals(buffer);

	}

}

virtual void MapChildConstants(ConstantsBuffer *buffer) const override

{

	if (buffer->IsEmpty())

	{

		buffer->AccountOffset(GetConstantsSize());

	}

	else

	{

		for (UInt32 i = 0; i < Length; i++) Elements[i]->MapConstants(buffer);

	}

}

public:

Object **Elements;

GeneralCollection(UInt32 length, Object **elements, Space objectSpace = Constants, Space childSpace = Mixed) :

	Elements(elements), MixedObjectContainer(length)

{

}

virtual UInt32 GetGlobalsSize() const override

{

	UInt32 size = 0;

	UInt32 length = Length;

	for (UInt32 i = 0; i < length; i++) size += Elements[i]->GetGlobalsSize();

	return size;

}

virtual UInt32 GetConstantsSize() const override

{

	UInt32 size = 0;

	UInt32 length = 0;

	for (UInt32 i = 0; i < length; i++) size += Elements[i]->GetConstantsSize();

	return size;

}

};

class GeneralVector : public MixedObjectContainer

{

protected:

virtual void MapChildGlobals(GlobalsBuffer *buffer) const override

{

	if (buffer->IsEmpty())

	{

		buffer->AccountOffset(Length * Element->GetGlobalsSize());

	}

	else

	{

		for (UInt32 i = 0; i < Length; i++) Element->MapGlobals(buffer);

	}

}

virtual void MapChildConstants(ConstantsBuffer *buffer) const override

{

	Element->MapConstants(buffer);

}

public:

Object *Element;

GeneralVector(UInt32 length, Object *element, Space objectSpace = Constants, Space childSpace = Mixed) :

	Element(element), MixedObjectContainer(length)

{

}

virtual UInt32 GetGlobalsSize() const override

{

	return Length * Element->GetGlobalsSize();

}

};

class GeneralMatrix : public GeneralVector

{

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const override

{

	GeneralVector::MapObjectConstants(buffer);

	buffer->Add(Rows);

	buffer->Add(Cols);

}

public:

UInt32 Rows;

UInt32 Cols;

GeneralMatrix(UInt32 rows, UInt32 cols, Object *element, Space objectSpace = Constants, Space childSpace = Mixed) :

	Rows(rows), Cols(cols), GeneralVector(rows * cols, element, objectSpace, childSpace)

{

}

};

// Standard neural network layer consists of a vector “Connections” that represents connections

// and a vector “Biases” that represents biases. Having some input vector “Input” to calculate

// current layer pass we need to do the following:

//

// “Connections” * “Input” + “Biases” (corresponding vector dimensions must agree)

//

// Though not in every case it is easy to work with vectors, you need non mathematical procedures

// to create a 2-dimensional matrix out of a vector, and in my opinion it is not good.

//

// In this case I’m using 2 matrices “Left” and “Right” as the connections and matrix “Biases”

// as the biases. To complete current layer pass I will have to do the following:

//

// “Left” * “Input” * “Right” + “Biases”

//

// By having the input matrix dimensions equal to “InputRows x InputCols” and by willing to receive

// “OutputRows x OutputCols” matrix from current layer pass the above operation should look like:

//

// “OutputRows x InputRows” * “InputRows x InputCols” * “InputCols x OutputCols” + “OutputRows x OutputCols”

//

class ExLayer : public MixedObject

{

private:

virtual void MapObjectGlobals(GlobalsBuffer *buffer) const

{

	// No globals

}

protected:

virtual void MapObjectConstants(ConstantsBuffer *buffer) const

{

	buffer->Add(InputRows);

	buffer->Add(InputCols);

	buffer->Add(OutputRows);

	buffer->Add(OutputCols);

}

public:

virtual void MapGlobals(GlobalsBuffer *buffer) const

{

	if (LrbSpace & Globals)

	{

		Left->MapGlobals(buffer);

		Right->MapGlobals(buffer);

		Biases->MapGlobals(buffer);

	}

	if (OutSpace & Globals) Outputs->MapGlobals(buffer);

}

virtual void MapConstants(ConstantsBuffer *buffer) const

{

	MixedObject::MapConstants(buffer);

	if (LrbSpace & Constants)

	{

		Left->MapConstants(buffer);

		Right->MapConstants(buffer);

		Biases->MapConstants(buffer);

	}

	if (OutSpace & Constants) Outputs->MapConstants(buffer);

}

Space OutSpace;

Space LrbSpace;

Object *Left;

Object *Right;

Object *Biases;

Object *Outputs;

UInt32 InputRows;

UInt32 InputCols;

UInt32 OutputRows;

UInt32 OutputCols;

ExLayer

(

	UInt32 inputRows,

	UInt32 inputCols,

	UInt32 outputRows,

	UInt32 outputCols,

	Object *left,

	Object *right,

	Object *biases,

	Object *outputs,

	Space outSpace = Globals,

	Space lrbSpace = Globals,

	Space objectSpace = Constants

) :

	Left(left),

	Right(right),

	Biases(biases),

	Outputs(outputs),

	InputRows(inputRows),

	InputCols(inputCols),

	OutputRows(outputRows),

	OutputCols(outputCols),

	LrbSpace(lrbSpace),

	OutSpace(outSpace),

	MixedObject(objectSpace)

{

}

};

class ZeroUInt32ExLayer : public ExLayer

{

public:

ZeroUInt32ExLayer

(

	UInt32 inputRows,

	UInt32 inputCols,

	UInt32 outputRows,

	UInt32 outputCols,

	Space outSpace = Globals,

	Space lrbSpace = Globals,

	Space objectSpace = Constants

) : ExLayer

(

	inputRows,

	inputCols,

	outputRows,

	outputCols,

	new ZeroUInt32Matrix(outputRows, inputRows, Constants, Mixed),

	new ZeroUInt32Matrix(inputCols, outputCols, Constants, Mixed),

	new ZeroUInt32Matrix(outputRows, outputCols, Constants, Mixed),

	new ZeroUInt32Matrix(outputRows, outputCols, Constants, Mixed),

	outSpace, lrbSpace, objectSpace

)

{

}

~ZeroUInt32ExLayer()

{

	delete Left;

	delete Right;

	delete Biases;

	delete Outputs;

}

};

class RandomUInt32ExLayer : public ExLayer

{

public:

RandomUInt32ExLayer

(

	UInt32 inputRows,

	UInt32 inputCols,

	UInt32 outputRows,

	UInt32 outputCols,

	Space outSpace = Globals,

	Space lrbSpace = Globals,

	Space objectSpace = Constants

) : ExLayer

(

	inputRows,

	inputCols,

	outputRows,

	outputCols,

	new RandomUInt32Matrix(outputRows, inputRows, Constants, Mixed),

	new RandomUInt32Matrix(inputCols, outputCols, Constants, Mixed),

	new RandomUInt32Matrix(outputRows, outputCols, Constants, Mixed),

	new RandomUInt32Matrix(outputRows, outputCols, Constants, Mixed),

	outSpace, lrbSpace, objectSpace

)

{

}

~RandomUInt32ExLayer()

{

	delete Left;

	delete Right;

	delete Biases;

	delete Outputs;

}

};

class ZeroSingleExLayer : public ExLayer

{

public:

ZeroSingleExLayer

(

	UInt32 inputRows,

	UInt32 inputCols,

	UInt32 outputRows,

	UInt32 outputCols,

	Space outSpace = Globals,

	Space lrbSpace = Globals,

	Space objectSpace = Constants

) : ExLayer

(

	inputRows,

	inputCols,

	outputRows,

	outputCols,

	new ZeroSingleMatrix(outputRows, inputRows, Constants, Globals),

	new ZeroSingleMatrix(inputCols, outputCols, Constants, Globals),

	new ZeroSingleMatrix(outputRows, outputCols, Constants, Globals),

	new ZeroSingleMatrix(outputRows, outputCols, Constants, Globals),

	outSpace, lrbSpace, objectSpace

)

{

}

~ZeroSingleExLayer()

{

	delete Left;

	delete Right;

	delete Biases;

	delete Outputs;

}

};

class RandomSingleExLayer : public ExLayer

{

public:

RandomSingleExLayer

(

	UInt32 inputRows,

	UInt32 inputCols,

	UInt32 outputRows,

	UInt32 outputCols,

	Single rangeA,

	Single rangeDelta,

	Space outSpace = Globals,

	Space lrbSpace = Globals,

	Space objectSpace = Constants

) : ExLayer

(

	inputRows,

	inputCols,

	outputRows,

	outputCols,

	new RandomSingleMatrix(outputRows, inputRows, rangeA, rangeDelta, Constants, Globals),

	new RandomSingleMatrix(inputCols, outputCols, rangeA, rangeDelta, Constants, Globals),

	new RandomSingleMatrix(outputRows, outputCols, rangeA, rangeDelta, Constants, Globals),

	new RandomSingleMatrix(outputRows, outputCols, rangeA, rangeDelta, Constants, Globals),

	outSpace, lrbSpace, objectSpace

)

{

}

~RandomSingleExLayer()

{

	delete Left;

	delete Right;

	delete Biases;

	delete Outputs;

}

};

class GeneralExLayer : public ExLayer

{

public:

GeneralExLayer

(

	UInt32 inputRows,

	UInt32 inputCols,

	UInt32 outputRows,

	UInt32 outputCols,

	Object *element,

	Space outSpace = Globals,

	Space lrbSpace = Globals,

	Space objectSpace = Constants

) :

ExLayer

(

	inputRows,

	inputCols,

	outputRows,

	outputCols,

	new GeneralMatrix(outputRows, inputRows, element, Constants, Mixed),

	new GeneralMatrix(inputCols, outputCols, element, Constants, Mixed),

	new GeneralMatrix(outputRows, outputCols, element, Constants, Mixed),

	new GeneralMatrix(outputRows, outputCols, element, Constants, Mixed),

	outSpace, lrbSpace, objectSpace

)

{

}

~GeneralExLayer()

{

	delete Left;

	delete Right;

	delete Biases;

	delete Outputs;

}

};

class GeneticAlgorithmConfiguration : public ConstantObject

{

public:

UInt32 Steps;

UInt32 WindowSize;

UInt32 PopulationSize;

UInt32 CrossoverProbability;

UInt32 MutationProbability;

UInt32 InversionProbability;

Single GrayRangeA;

Single GrayRangeB;

GeneticAlgorithmConfiguration

(

	UInt32 steps,

	UInt32 populationSize,

	UInt32 crossoverProbability,

	UInt32 mutationProbability,

	UInt32 inversionProbability,

	Single grayRangeA,

	Single grayRangeB

)

:

	Steps(steps),

	PopulationSize(populationSize),

	CrossoverProbability(crossoverProbability),

	MutationProbability(mutationProbability),

	InversionProbability(inversionProbability),

	GrayRangeA(grayRangeA),

	GrayRangeB(grayRangeB)

{

}

virtual void MapConstants(ConstantsBuffer *buffer) const override

{

	buffer->Add(Steps);

	buffer->Add(PopulationSize);

	buffer->Add(CrossoverProbability);

	buffer->Add(MutationProbability);

	buffer->Add(InversionProbability);

	buffer->Add(GrayRangeA);

	buffer->Add((GrayRangeB - GrayRangeA) / UINT_MAX);

}

};

#endif

[/codebox]

For device side:

“DeviceTypes.h”

[codebox]

#ifndef DEVICETYPES

#define DEVICETYPES

#include “CommonTypes.h”

typedef struct align(8)

{

UInt32 Length;

UInt32 Size;

} Array;

typedef struct align(8)

{

UInt32 Length;

UInt32 Size;

} Collection;

typedef struct align(16)

{

UInt32 Length;

UInt32 Size;

UInt32 Rows;

UInt32 Cols;

} Matrix;

typedef struct align(16)

{

UInt32 InputRows;

UInt32 InputCols;

UInt32 OutputRows;

UInt32 OutputCols;

} NeuralNetworkExLayer;

typedef struct align(16)

{

UInt32 Steps;

UInt32 PopulationSize;

UInt32 CrossoverProbability;

UInt32 MutationProbability;

UInt32 InversionProbability;

Single RangeA;

Single RangeDelta;

} GeneticAlgorithmConfiguration;

#endif

[/codebox]

After making changes to the way information is inserted to a buffer don’t forget to change the corresponding device types.

P.S.

I really hope that you people will find anything to improve. All of your comments are appreciated. A bit later I will try to add some documentation, and will add compilable project achieve.