Problems with Global Variables?

Root Question: Is there a problem with CUDA using global variables?

I am trying to write a static library to implement an API to be used in another program. There is a large amount of data that needs to be transferred to CUDA. Next, a large number of iterative calls will be made to perform a computation on that data. To that end, I set up several functions in the form:

float *g_LotsOfData = NULL;

float *g_Results = NULL;

int g_nNumberOfElements = 0;

//This function is called once

void Initialize( float *pLotsOfData,

					 const int &nNumberOfElements )

{

	//Make the appropriate calls to initializa CUDA

	InitCuda();

	//Set variables we will use later

	int nSize = nNumberOfElements*sizeof( float );

	cudaMalloc( ( void **) &g_LotsOfData, nSize );

	cudaMalloc( ( void **) &g_Results, nSize );

	g_nNumberOfElements = nNumberOfElements;

}

//This function is called once

void Cleanup()

{

	cudaFree( g_LotsOfData );

	cudaFree( g_Results );

}

//This function is called thousands (possibly millions) of time

void CalculateDataPoint( const float &fFactor1,

									const float &fFactor2,

									float &fResult )

{

	/*

		Do some work to initialize the result array, 

		block sizes, etc.

	*/

	CUDAProcessingFunction<<< nBlocks, nBlockSize >>>( fFactor1, fFactor2, g_LotsOfData, g_Results );

	/*

		Do some more work to copy the results back 

		and sum them into fResult

	*/

}

The outside code makes all of the calls in the appropriate order, but the results are wrong. I tried the following instead

float *g_LotsOfData = NULL;

float *g_Results = NULL;

int g_nNumberOfElements = 0;

//This function is called once

void Initialize(  )

{

	//Make the appropriate calls to initializa CUDA

	InitCuda();

}

//This function is called thousands (possibly millions) of time

void CalculateDataPoint( float *pLotsOfData,

									const int &nNumberOfElements

									const float &fFactor1,

									const float &fFactor2,

									float &fResult )

{

	//Set variables we will use later

	int nSize = nNumberOfElements*sizeof( float );

	cudaMalloc( ( void **) &g_LotsOfData, nSize );

	cudaMalloc( ( void **) &g_Results, nSize );

	g_nNumberOfElements = nNumberOfElements;

	/*

		Do some work to initialize the result array, 

		block sizes, etc.

	*/

	CUDAProcessingFunction<<< nBlocks, nBlockSize >>>( fFactor1, fFactor2, g_LotsOfData, g_Results );

	/*

		Do some more work to copy the results back 

		and sum them into fResult

	*/

	cudaFree( g_LotsOfData );

	cudaFree( g_Results );

}

This code works…

Has anyone had any experience with this? After some work, the rest of my program is working beautifully. I will be in great shape if I can avoid these excess memory transfers.

OK. I have a little more information. This isn’t just an issue with CUDA memory for me. I set up a c++ class in my library which contains simply a series of integers which are initialized in the structure. An immediate subsequent call to the class shows that the integer values have changed.

This kind of behavior usually indicates that you have out of bounds memory accesses somewhere in your code. If you are running on linux, valgrind can help you find those.

I found the cause of the second issue (with the class). It was a stupid mistake on my part. I was conditionally compiling some class members into the library, but the same members were not declared for the linking application as a result. Removing the conditional compilation stuff fixed that issue.

Unfortunately, this is not the same issue with the CUDA variables. I have also checked and re-checked to ensure that there are no arrays running out of bounds…

Hi, this may be a very basic question, but:

how do you compile a static library in cuda? Do you have a sample makefile? I am in a similar situation where I need to replace a few c++ functions by ones written using cuda.

Any help will be greatly appreciated.

TIA,

arjun