Greetings,
currently I’m trying to implement genetic algorithm using CUDA. I use the code below to evaluate each individuals using a CUDA kernel.
__global__ void evaluate(int * population,
int * distance,
int * cost,
int nTowns,
int * d_index)
{
int sum = 0;
int t0, t1, idx;
idx = threadIdx.x + blockIdx.x * blockDim.x;
for (size_t i = 1; i < nTowns; i++) {
t0 = idx * nTowns + (i - 1);
t1 = idx * nTowns + i;
sum = sum + distance[population[t0] * nTowns + population[t1]];
}
t0 = idx * nTowns + nTowns - 1;
t1 = idx * nTowns;
cost[idx] = sum + distance[population[t0] * nTowns + population[t1]];
d_index[idx] = threadIdx.x;
}
I occasionally got some errors from this code, like 2-3 times out of 100 runs. Then I tried using cuda-memcheck and I got these outputs:
GPUassert: an illegal memory access was encountered ga_tes_3a.cu 469
========= CUDA-MEMCHECK
========= Program hit cudaErrorIllegalAddress (error 77) due to "an illegal memory access was encountered" on CUDA API call to cudaDeviceSynchronize.
...
GPUassert: unspecified launch failure ga_tes_3a.cu 469
========= CUDA-MEMCHECK
========= Invalid __global__ read of size 4
========= at 0x000000e0 in evaluate(int*, int*, int*, int, int*)
========= by thread (327,0,0) in block (7,0,0)
========= Address 0x3c45c467c is out of bounds
...
========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
How can I track this error? Any idea of why is this happened?
I’m sorry if my English is bad.