Hello, I will start this post as I always do. I am a noob at CUDA and relearning C++, so I may explain things in strange ways, but I hope you can get the idea I’m trying to get across. You have been warned :-)
I am working on a model similar to a common predator-prey model for plants. There are simple differential equations for each plant species, water, and nutrient that relate to one another and enforce competition and coexistence. So far I have been hardcoding the equations according to how many species I want to include, but I would like to avoid this and allow a user to input the number of species and have the equations and arrays take this into account “automatically”.
The problems I’m running into include allocating/populating arrays of proper size and efficiently modifying the dynamics. I have a parameter, say ‘g’ for growth coefficient, that each species will have. I pass into the kernel an unsigned int nSpecies and am trying to do something like
float * g[nSpecies];
g[0] = 123;
g[1] = 456;
...
but it complains that nSpecies is not constant. I then tried
float * g = new float [nSpecies];
g[0] = 123;
g[1] = 456;
...
which complained that new is a host function that cannot be called from a device kernel. I also tried defining a variable thinking it may be considered constant in the compiler’s eyes,
#define ArraySize nSpecies
__device__ FUNCTION(inputs)
{
float * g[ArraySize];
g[0] = 123;
g[1] = 456;
...
}
but it complained that ArraySize is not constant either. So I settled for allocating with a number larger than I anticipate actually using, such as 10 species, then just populating the first nSpecies elements of the arrays. This is ok, but definitely not the best way to do this.
My next hurdle was adjusting the dynamics. Each species as well as nutrient and water have their own WxH grid and the equations operate on the cells of these grids. For the species I stack the grids on top of each other so I can pass in a single variable that contains all of the information, then I loop the equations such as
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
for(unsigned int sp = 0; sp < nSpecies; sp++)
{
Species[sp*h + y*w + x] += [Species-specific equation];
Water[y*w + x] += dt * [Species-dependent parts of the equation];
Nutrient[y*w + x] += dt * [Species-dependent parts of the equation];
}
Water[y*w + x] += dt * [Species-independent parts of the equation];
Nutrient[y*w + x] += dt * [Species-independent parts of the equation];
this seems to work (I haven’t tested it too much) but the program takes a huge hit to its performance. For a single species (nSpecies = 1) the code used to run at ~70 fps, but with the loop it is running at ~45 fps, even though it only loops once. (EDIT: this decrease is the end result of several instances of this loop implementation in different functions. Each one is about a 4-10 fps decrease on its own.)
Does anybody know if it is the for-loop that costs so much overhead? If not, what else could it be? Is allocating arrays the way I do inefficient? Is there a better way to accomplish what I need?
For extra credit, you can read my other posts:
The goal is to eventually allocate all variables in a single place to help the user quickly and accurately adjust the model. The end goal is to allocate default values then read a text file that will update these variables at runtime to allow the user to iterate through several model setups without remaking the source each time. I see this as an extension of the first part, but perhaps it is completely independent.
Thank you for reading my post. I appreciate any and all help!
~Josh