My Name is Saeed Mohammadi. I’m a C++ programmer and beginner in CUDA. I’m working on a simulating program for simulating the Solidification process in Metallurgy. Here, we have a completed program which is working correctly. As we found the CUDA on nVidia web site, we started discussing about how we can increase the performance!
I have a 3D array of x, y, and z which are in arbitrary size. The size of (xyz) is usually large. I have some arrays for saving the “Temperature of the point(x, y, z)”, “Material ID”, and some parameters such as “NewTemperature”, “K”, “Ro”, “Cp”, “Fs”, and etc. Some of these arrays are made at first and used in the program but some of them are made at first but must be changed many times. Also I have some other parameters that must be generated during the execution. In our CPU-Based program, we have a while and loop like this :
[codebox]
while(maxTemp<540)
{
for( z; z < limit.z ; z+=direction.z)
{
for( x; x < limit.x ; x+= direction.x)
{
for( y; y < limit.y ; y+= direction.y)
{
//starting the process to find the maxTemp
}
//changing the direction of y
}
//changing the direction of x
}
//changing the direction of z
}
[/codebox]
What is the goal?
In this simulation we have a mold which is turned to a mesh with x, y, and z. we are looking to find out when the cast’s temperature reduces to 540c. Only the non-boundary position could be in the process, so x=0 and x=limit are not involved. For an any internal position like (x, y, z), we have 6 neighbor, for example: x+1, x-1, y+1, y-1, z+1, z-1
The loop continues until the maxTemp increases so there is at least hundreds of iterations. At the end of each loop, we change the step forward or backward with the ‘direction’ parameter. ‘direction’ has the value of “1” or “-1”. In the loop over ‘y’ we find the 6 neighbors of base position, their “Material ID”, their temperature, and the value of “K” for each of them by calling a function with these parameters as arguments. Also you have to check their “Material ID” for some calculation and some other local parameters.
These are just some local variables with lifetime of a thread. I have to find the new temperature of the base position and replace it to its corresponding index in the “NewTemperature” array which I think are in Global Memory and have to be returned in Host.
Solutions
I tested the old program and start to code it with CUDA. For this reason I’ve found some way for calling Kernel:
-
Performance=Disaster: loop on host, call single thread kernel
-
Medium Performance: call a global kernel where the “z” values are threads. For each threadIdx.x or “z” call a device kernel to loop over the “x” and “y”
-
Best Performance but incomplete: defining nBlocks & BlockSize, call a global kernel where each threadIdx.x is a random position and do calculation.
Errors & Problems
When I use the second way at above, my program works correctly with true result but it’s so much slower than the old CPU program. So I started coding in the third way. I know that using loops and if-else conditions makes disaster. I tried to reduce redundancy and deleted some loops and if-else conditions but still there are few ones. It seems to be all right but it catches 2 errors:
-
Kernel invocation error: “too many resources requested for lunch”
-
Writing to my NewTemperature [index] array fails.
My Question
This Solidification program has smallest data and parameters than another series of my program in metallurgical simulation. If I change the structure of input data, maybe I will pass this error. In the middle of my if-else and loops, when a unique conditions occurs, I have to change the value of NewTemperature[index] where index is produced by blockIdx.x*blockDim.x+threadIdx.x . It cannot set the value. So:
Question: Is there any limitation of calling device kernels? What about the input arrays?
I know that I cannot describe my all process in 1 page and also you don’t have much time to read this! But I’m counting on your experiment. Thank you for your time and consideration. I look forward to a kind response from you.
Yours Sincerely,
Saeed Mohammadi