Root Question: Is there a problem with CUDA using global variables?
I am trying to write a static library to implement an API to be used in another program. There is a large amount of data that needs to be transferred to CUDA. Next, a large number of iterative calls will be made to perform a computation on that data. To that end, I set up several functions in the form:
float *g_LotsOfData = NULL;
float *g_Results = NULL;
int g_nNumberOfElements = 0;
//This function is called once
void Initialize( float *pLotsOfData,
const int &nNumberOfElements )
{
//Make the appropriate calls to initializa CUDA
InitCuda();
//Set variables we will use later
int nSize = nNumberOfElements*sizeof( float );
cudaMalloc( ( void **) &g_LotsOfData, nSize );
cudaMalloc( ( void **) &g_Results, nSize );
g_nNumberOfElements = nNumberOfElements;
}
//This function is called once
void Cleanup()
{
cudaFree( g_LotsOfData );
cudaFree( g_Results );
}
//This function is called thousands (possibly millions) of time
void CalculateDataPoint( const float &fFactor1,
const float &fFactor2,
float &fResult )
{
/*
Do some work to initialize the result array,
block sizes, etc.
*/
CUDAProcessingFunction<<< nBlocks, nBlockSize >>>( fFactor1, fFactor2, g_LotsOfData, g_Results );
/*
Do some more work to copy the results back
and sum them into fResult
*/
}
The outside code makes all of the calls in the appropriate order, but the results are wrong. I tried the following instead
float *g_LotsOfData = NULL;
float *g_Results = NULL;
int g_nNumberOfElements = 0;
//This function is called once
void Initialize( )
{
//Make the appropriate calls to initializa CUDA
InitCuda();
}
//This function is called thousands (possibly millions) of time
void CalculateDataPoint( float *pLotsOfData,
const int &nNumberOfElements
const float &fFactor1,
const float &fFactor2,
float &fResult )
{
//Set variables we will use later
int nSize = nNumberOfElements*sizeof( float );
cudaMalloc( ( void **) &g_LotsOfData, nSize );
cudaMalloc( ( void **) &g_Results, nSize );
g_nNumberOfElements = nNumberOfElements;
/*
Do some work to initialize the result array,
block sizes, etc.
*/
CUDAProcessingFunction<<< nBlocks, nBlockSize >>>( fFactor1, fFactor2, g_LotsOfData, g_Results );
/*
Do some more work to copy the results back
and sum them into fResult
*/
cudaFree( g_LotsOfData );
cudaFree( g_Results );
}
This code works…