Data structure layout in memory different for host and device

Arthouse · August 18, 2014, 8:44pm

I have run into a problem with data structures being laid out differently in device and host code.

I’ve isolated some code to test for this.

class Managed {
public :
	void * operator new( size_t size ) {
		void * ptr ;
		::cudaMallocManaged( &ptr, size, cudaMemAttachGlobal ) ;
		return ptr ;
	}
	void operator delete( void * ptr ) {
		cudaFree( ptr ) ;
	}
} ;

class MySymbol : public Managed {
	unsigned int value ;
} ;

class MySymbol2 {
	unsigned int value ;
} ;

template<class K,class V> class MyPair : public Managed {
public :
	K k ;
	V v ;

    __host__ __device__ void printMemory() {
		printf( "Pair() sizeof=%d\n", sizeof( MyPair<K,V> ) ) ;
		printf( "Offset k=%lld v=%lld\n", ( char * ) &k - ( char *  )this, ( char * ) &v - ( char *  )this ) ;
	}
} ;


template <class T> __global__ void print_memory( T * t ) {
	t->printMemory() ;
}

void TestDataStructureOffset() {
	MyPair<MySymbol,unsigned int> * pair = new MyPair<MySymbol,unsigned int>() ;
	MyPair<unsigned int,unsigned int> * pair2 = new MyPair<unsigned int,unsigned int>() ;
	MyPair<MySymbol2,unsigned int> * pair3 = new MyPair<MySymbol2,unsigned int>() ;

	std::cout << "HOST\n:" ;
	pair->printMemory() ;
	pair2->printMemory() ;
	pair3->printMemory() ;

	std::cout << "\n\nDEVICE\n:" ;
	print_memory<<<1,1>>>( pair ) ;
	print_memory<<<1,1>>>( pair2 ) ;
	print_memory<<<1,1>>>( pair3 ) ;

	cudaDeviceSynchronize() ;
}

On my platform (x64 windows), I get the following result:

HOST
:Pair() sizeof=8
Offset k=0 v=4
Pair() sizeof=8
Offset k=0 v=4
Pair() sizeof=8
Offset k=0 v=4


DEVICE
:Pair() sizeof=12
Offset k=4 v=8
Pair() sizeof=8
Offset k=0 v=4
Pair() sizeof=8
Offset k=0 v=4

It appears that the MyPair<MySymbol,unsigned int> is padded differently on the GPU as compared to the host code. Is this a bug, or intended behavior? This is causing significant problems when porting from C++ CPU to the GPU.

little_jimmy · August 19, 2014, 4:40am

Quoting a section from the programming guide:

The alignment requirement is automatically fulfilled for the built-in types of char, short,
int, long, longlong, float, double like float2 or float4 .
For structures, the size and alignment requirements can be enforced by the compiler
using the alignment specifiers align(8) or align(16)

The programming guide discusses alignment requirements in length; merely search the document for ‘alignment’ or ‘alignment requirement’
Key pages to note would be 75, 90, 197 (document version 5_5)

And good luck; I have by now abandoned structures, particularly mixed-type, as much as possible, in favour of more simplistic 1 dimensional arrays that align without hassle or worry

Arthouse · August 19, 2014, 2:34pm

I’ve filed a bug report for this, as the inconsistency between structs with supertypes vs structs without supertypes seems incorrect.

Robert_Crovella · August 19, 2014, 11:54pm

Indeed it seems that there is an inconsistency on Windows only for this case. I tried your code on Linux (RHEL 6.2) and the host compiler and device compiler agree on class arrangement for all cases. That is, my printout for both HOST and DEVICE is the same as your DEVICE case. The windows host compiler is known to behave a little differently on class arrangement in some cases. A few of those cases are documented here:

[url]Programming Guide :: CUDA Toolkit Documentation

None of the 3 cases listed appears to be an exact fit for your case. In discussion with compiler engineers at NVIDIA, this case may need to get documented as well. A doc update now to reflect this is too late to be incorporated in the CUDA 6.5 docs, since CUDA 6.5 release is imminent. But there may be an update in CUDA 7.0 time frame, to shed light on this.

It was suggested that a possible workaround would be to make the base class (Managed) non-empty. Is this something you could try in your case? I don’t know if there would be any objections to that. Even if there were, it would still be interesting to know your results for test purposes.

As indicated in the above doc link, there are sometimes discrepancies between host and device class arrangement in a few cases that have not been addressed in the compiler, particularly with respect to the windows host compiler. These need to be avoided for the forseeable future in your code that intends to share such classes between host and device.

Topic		Replies	Views
Structure Alignment? CUDA Structure Alignment differs? CUDA Programming and Performance	12	49472	December 11, 2008
Size of a structure on device CUDA Programming and Performance	1	2014	February 25, 2009
struct differs between emu and device CUDA Programming and Performance	4	3882	June 2, 2008
global memory alignment issue [pycuda] CUDA Programming and Performance	5	2703	January 21, 2012
structure pack issue CUDA Programming and Performance	1	3514	March 18, 2008
Combining g++ and NVCC, the pathway to hell is paved with padding! Alignment issues between host CUDA Programming and Performance	4	1698	June 29, 2010
.cpp - .cu struct miss alignment problem on cuda for CPU or kernel functions struct, alignment CUDA Programming and Performance	0	7033	January 25, 2011
Struct in CUDA can i use this struct in CUDA CUDA Programming and Performance	15	89515	June 26, 2009
Problem with cudaMallocHost and structs CUDA Programming and Performance	4	4927	August 6, 2008
array of structure problem CUDA Programming and Performance	7	5980	September 22, 2009

Data structure layout in memory different for host and device

Related topics