I am new to CUDA and am relearning C++, so please forgive me for any non-technical noob jargon.
I am working on a model that simulates plant (biomass) growth using simple differential equations. One of them includes diffusing the biomass density and nutrient density, which is what I’ve shown below. I am having a problem with the graphics (using openGL) and cudaMalloc getting along. If I declare all of the constants used by a kernel inside of that kernel, it works fine but is inefficient and slow. If I declare all of my constants on the host then use cudaMalloc and cudaMemCpy to get them onto the device, the math works (I’m passing in what I expect to) but the output display is getting corrupted. I’ve posted some code snippets and .png images to illustrate what I’m doing and what the outcome is.
This code works fine (but slow):
__global__ void
d_diffuse_tex(float *grid, float *ftemp, int w, int h, float dt, float hh, unsigned int TextureFlag, unsigned int nSpecies)
{
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
if (TextureFlag == 1) /*---------- Diffuse biomass ----------*/
{
float alpha[10]; // Diffusion coefficients for biomass
alpha[0] = 0.005479f; // Diffusion coefficient for species 1
alpha[1] = 0.005479f; // Diffusion coefficient for species 2
alpha[2] = 0.005479f; // Diffusion coefficient for species 3
alpha[3] = 0.005479f; // Diffusion coefficient for species 4
alpha[4] = 0.005479f; // Diffusion coefficient for species 5
alpha[5] = 0.005479f; // Diffusion coefficient for species 6
alpha[6] = 0.005479f; // Diffusion coefficient for species 7
alpha[7] = 0.005479f; // Diffusion coefficient for species 8
alpha[8] = 0.005479f; // Diffusion coefficient for species 9
alpha[9] = 0.005479f; // Diffusion coefficient for species 10
if (nSpecies == 1)
{
grid[y*w + x] = DiffuseBio(grid[y*w + x], dt, hh, alpha[0], 0, x, y); // Diffuse species 1
}
//Other conditions for nSpecies == 2, 3, etc)
//Also a similar set of conditions and function calls for diffusing nutrient (TextureFlag == 2)
//DiffuseBio is a separate function I call that helps me scale to increasing number of species without repeating code, but is not important for here
}
In this one I define the diffusion coefficients for each species inside the kernel and it is a bit slow and not efficient at all. It produces results like the one shown in “Correct_Graphics.png” (attached)
This second code is extremely similar, but I declare the variables on the host, populate the array, then pass them into the kernel. I tested it before and mathematically the model is working, it gets all of the constants it needs. I even got it to produce the correct graphics once or twice with a 20% speedup over the first method. However, now the graphics are messed up and I cannot figure out why. I started commenting out parts of the code and found that just by allowing the line
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_Db, size ) );
the graphics became corrupt.
Main .cpp file:
float *h_Db = NULL; //Diffusion coefficient array for biomass (host)
float *d_Db = NULL; //Diffusion coefficient array for biomass (device)
float *h_Dn = NULL; //Diffusion coefficient array for nutrient (host)
float *d_Dn = NULL; //Diffusion coefficient array for nutrient (device)
int
main( int argc, char** argv)
{
int size = 10 * sizeof(float);
h_Db = new float[10];
h_Db[0] = 0.005479f;
h_Db[1] = 0.005479f;
h_Db[2] = 0.005479f;
h_Db[3] = 0.005479f;
h_Db[4] = 0.005479f;
h_Db[5] = 0.005479f;
h_Db[6] = 0.005479f;
h_Db[7] = 0.005479f;
h_Db[8] = 0.005479f;
h_Db[9] = 0.005479f;
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_Db, size ) );
CUDA_SAFE_CALL( cudaMemcpy( d_Db, h_Db, size, cudaMemcpyHostToDevice));
h_Dn = new float[10];
h_Dn[0] = 0.027397f;
h_Dn[1] = 0.027397f;
h_Dn[2] = 0.027397f;
h_Dn[3] = 0.027397f;
h_Dn[4] = 0.027397f;
h_Dn[5] = 0.027397f;
h_Dn[6] = 0.027397f;
h_Dn[7] = 0.027397f;
h_Dn[8] = 0.027397f;
h_Dn[9] = 0.027397f;
CUDA_SAFE_CALL( cudaMalloc( (void**) &d_Dn, size ) );
CUDA_SAFE_CALL( cudaMemcpy( d_Dn, h_Dn, size, cudaMemcpyHostToDevice));
... // Other stuff
}
Kernel:
__global__ void
d_diffuse_tex(float *grid, float *ftemp, int w, int h, float dt, float hh, unsigned int TextureFlag, unsigned int nSpecies, float *alpha)
{
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
if (TextureFlag == 1) /*---------- Diffuse biomass ----------*/
{
if (nSpecies == 1)
{
grid[y*w + x] = DiffuseBio(grid[y*w + x], dt, hh, alpha[0], 0, x, y); // Diffuse species 1
}
//Other conditions for nSpecies == 2, 3, etc)
//Also a similar set of conditions and function calls for diffusing nutrient (TextureFlag == 2)
//DiffuseBio is a separate function I call that helps me scale to increasing number of species without repeating code, but is not important for here
}
As I mentioned I’ve checked that what is getting passed into the kernel is correct, so the tangled web of function calls that eventually lead to this kernel are not worth posting here. The graphics for this code snippet are shown in the “Messed_Up_Graphics.png” (attached).
Has anybody else seen this? Does anybody have a solution or a suggestion to help orient me in the direction of a solution?
Thank you!