Hey Cuda Community,
I have been running this code to generate a massive number of primes up to a certain number in a brute force manner. My goal is to get this code running on every machine in a computer lab (all machines are identical). The code I currently have runs correctly on a few machines, but I am encountering a number of errors on the rest of them:

The program prints all primes up to a certain number which is lower than the desired maximum

The program prints out many numbers, most of which not prime, up to the maximum (I find this one most odd, as when the program does this the values in PrimeList_h are all 0, however they still print even though printf is within if(PrimeList_h[i] != 0)

Combination of the last two
The code is as follows:
[codebox]
#include <stdio.h>
#include <math.h>
#include <cuda.h>
#include <cstdio>
#include <ctime>
__global__ void primalrage(float *PrimeList, int N)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int n = 1;
for(int j = 2; j < i; j++)
{
if (i % j == 0)
{ n = 0; }
}
if (n == 1)
{
PrimeList[i] = i;
}
else
PrimeList[i] = 0;
}
int main(void)
{
double temp;
clock_t start;
double diff;
start = clock();
float *PrimeList_h, *PrimeList_d; // Pointer to host & device arrays
const int N = 100000; // Number of elements in arrays
size_t size = N * sizeof(float);
PrimeList_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &PrimeList_d, size); // Allocate array on device
int block_size = 256;
int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
dim3 dimBlock(block_size, block_size);
dim3 dimGrid(n_blocks, n_blocks);
primalrage <<< dimGrid, dimBlock >>> (PrimeList_d, N);
cudaMemcpy(PrimeList_h, PrimeList_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
for (int i=0; i<N; i++)
if (PrimeList_h[i] !=0){
printf("%d %f\n", i, PrimeList_h[i]);
}
free(PrimeList_h); cudaFree(PrimeList_d);
diff = (clock()  start) / (double)CLOCKS_PER_SEC;
printf("Time: %f /n",diff);
}[/codebox]
This is simply an example to show a speed up over a serialized version of the code.
Any input would be greatly appreciated.
EDIT: I have updated this post to reflect a few of the changes I have made based on your suggestions