I am the only one (that i know of). Its for my thesis.

The aim of the project is to make the maximum optimisation of the simplex algorithm using a GPGPU.

The algorithm is basically this - get the pivot column -> get the pivot row -> perform base transformations -> {repeat until all the values in the bottom row are non negative (represented by a negative pivot column)}

i initially had the loop on host code, and called the base transformation as a kernel, but i changed this because it was too slow (had to keep moving the value of the pivot column variable back to host memory so that i could evaluate if another loop was needed).

So instead i made one kernel that did everything (reading the pivotcolumn, pivot and performing the base transformation).

But now, i have the problem of race conditions where one thread will race ahead and change the value of the pivotcolumn/pivot, making another thread work on the incorrect set of data.

The functions findPivotColumn and findPivot are **device** functions which store the results in the pointer that is fed in through arguments.

This is the code for the algorithm **that works** so long as you are only using only one block. I know i know __syncthreads everywhere (but it races if they are not put in in the places they are put atm)

```
__global__ void d_p(float *matrix,int *g_neq,int *g_ncons,int*g_pivotColumn,int*g_pivot,size_t *g_pitch){
/* Init threads */
int x = threadIdx.x;
int y = threadIdx.y;
int z = threadIdx.z;
/* Init dimension of block (how many threads */
int bdx = blockDim.x;
int bdy = blockDim.y;
int bdz = blockDim.z;
/* Init block indexs */
int bx = blockIdx.x;
int by = blockIdx.y;
/* init dimension of grid (how many blocks */
int gdx = gridDim.x;
/* Declare variables */
int tid;
int eq;
float pivotFactor;
float rowFactor;
float *pivotRow;
float *eqRow;
/* Get sequential index for threads spanning multiple blocks */
tid= by*gdx*(bdx*bdy*bdz) + bx*(bdx*bdy*bdz) + z*(bdx*bdy) + bdx*y +x;
/* Find first pivot column */
d_findPivotColumn(matrix,g_neq,g_ncons,g_pivotColumn,g_pitch);
__syncthreads();
/* While columns of the function to be optimised hold negative values */
while(*g_pivotColumn!=-1){
/* find pivot row */
d_findPivot(matrix,g_neq,g_ncons,g_pivotColumn,g_pivot,g_pitch);
/* Init pivot row (actual) */
pivotRow = (float*)((char*)matrix + *g_pivot * *g_pitch);
/* Value for pivot factor */
pivotFactor = pivotRow[*g_pivotColumn];
/* Loop through rows */
for(eq=0;eq<*g_neq;eq++){
/* Row we are working with */
eqRow = (float*)((char*)matrix + eq * *g_pitch);
/* Dont process pivot row */
if(*g_pivot!=eq){
rowFactor = eqRow[*g_pivotColumn];
/* NEED to syncthreads here or it doesnt work! */
__syncthreads();
/* only use threads within the dimension of the matrix */
if(tid<*g_neq+*g_ncons+1)
eqRow[tid] = eqRow[tid] - (rowFactor/pivotFactor)*pivotRow[tid];
}
}
/* find the next pivot column for next iteration */
__syncthreads();
d_findPivotColumn(matrix,g_neq,g_ncons,g_pivotColumn,g_pitch);
__syncthreads();
}
}
```