I am the only one (that i know of). Its for my thesis.
The aim of the project is to make the maximum optimisation of the simplex algorithm using a GPGPU.
The algorithm is basically this - get the pivot column → get the pivot row → perform base transformations → {repeat until all the values in the bottom row are non negative (represented by a negative pivot column)}
i initially had the loop on host code, and called the base transformation as a kernel, but i changed this because it was too slow (had to keep moving the value of the pivot column variable back to host memory so that i could evaluate if another loop was needed).
So instead i made one kernel that did everything (reading the pivotcolumn, pivot and performing the base transformation).
But now, i have the problem of race conditions where one thread will race ahead and change the value of the pivotcolumn/pivot, making another thread work on the incorrect set of data.
The functions findPivotColumn and findPivot are device functions which store the results in the pointer that is fed in through arguments.
This is the code for the algorithm that works so long as you are only using only one block. I know i know __syncthreads everywhere (but it races if they are not put in in the places they are put atm)
__global__ void d_p(float *matrix,int *g_neq,int *g_ncons,int*g_pivotColumn,int*g_pivot,size_t *g_pitch){
/* Init threads */
int x = threadIdx.x;
int y = threadIdx.y;
int z = threadIdx.z;
/* Init dimension of block (how many threads */
int bdx = blockDim.x;
int bdy = blockDim.y;
int bdz = blockDim.z;
/* Init block indexs */
int bx = blockIdx.x;
int by = blockIdx.y;
/* init dimension of grid (how many blocks */
int gdx = gridDim.x;
/* Declare variables */
int tid;
int eq;
float pivotFactor;
float rowFactor;
float *pivotRow;
float *eqRow;
/* Get sequential index for threads spanning multiple blocks */
tid= by*gdx*(bdx*bdy*bdz) + bx*(bdx*bdy*bdz) + z*(bdx*bdy) + bdx*y +x;
/* Find first pivot column */
d_findPivotColumn(matrix,g_neq,g_ncons,g_pivotColumn,g_pitch);
__syncthreads();
/* While columns of the function to be optimised hold negative values */
while(*g_pivotColumn!=-1){
/* find pivot row */
d_findPivot(matrix,g_neq,g_ncons,g_pivotColumn,g_pivot,g_pitch);
/* Init pivot row (actual) */
pivotRow = (float*)((char*)matrix + *g_pivot * *g_pitch);
/* Value for pivot factor */
pivotFactor = pivotRow[*g_pivotColumn];
/* Loop through rows */
for(eq=0;eq<*g_neq;eq++){
/* Row we are working with */
eqRow = (float*)((char*)matrix + eq * *g_pitch);
/* Dont process pivot row */
if(*g_pivot!=eq){
rowFactor = eqRow[*g_pivotColumn];
/* NEED to syncthreads here or it doesnt work! */
__syncthreads();
/* only use threads within the dimension of the matrix */
if(tid<*g_neq+*g_ncons+1)
eqRow[tid] = eqRow[tid] - (rowFactor/pivotFactor)*pivotRow[tid];
}
}
/* find the next pivot column for next iteration */
__syncthreads();
d_findPivotColumn(matrix,g_neq,g_ncons,g_pivotColumn,g_pitch);
__syncthreads();
}
}