Value does not change Value changes in debugging but not at host side

Hi,

I am trying to implement neural network trained by genetic algorithm on CUDA. When i debug the kernel in emu mode, i see that value changes. But when kernel execution finishes and i copy the value to host and display, i see that it never changes. The code related about problem as shown below,

In kernel code fitnessVector values change when i debug it, but when i write those values to a file on host side, i see that every time same values are written. I could not figure out why. Thanks for reading and answers.

Kernel Code

__global__ void evolvePopulation(float individuals[CONNECTION_NUM * MAX_POPULATION], PGAConfig gaConfig, ANNDATA *training_data, int trainDataSize, Rand48 *random, float fitnessVector[MAX_POPULATION], Chrosomome res, float* debugFitness){

	int blockId		= blockIdx.x;

	int threadId	= threadIdx.x;

	int index = blockId * blockDim.x + threadId;

	

	//some code	

	

	if(err < fitnessVector[index]){

			   fitnessVector[index] = err;

		for(int indx = 0 ; indx < CONNECTION_NUM; indx++){

			individuals[index * CONNECTION_NUM + indx] = crossVec[indx];

		}

	}

	__syncthreads();

//	findMinFitness(fitnessVector,debugFitness);

}

Host part

float* d_fitnessArry;

cudaMalloc ((void**)&d_fitnessArry,h_gaConfig->maxPopulation * sizeof(float));

calculateFirstFitness<<<MAX_POPULATION / BLOCK_SIZE, BLOCK_SIZE>>>(d_individuals,d_training_data,trainDataSize,d_fitnessArr

y);

float* hostFitness = new float[h_gaConfig->maxPopulation];

for(int cycle = 0; cycle < 100; cycle++){

		evolvePopulation<<<MAX_POPULATION / BLOCK_SIZE, BLOCK_SIZE>>>(d_individuals,d_ga_config,d_training_data,trainDataSize,

d_random,d_fitnessArry, d_result,d_best);

		cudaMemcpy (hostFitness,d_fitnessArry,sizeof(float) * h_gaConfig->maxPopulation, cudaMemcpyDeviceToHost);

		for(int fit  = 0; fit < h_gaConfig->maxPopulation; fit++){

			out<<hostFitness[fit]<<std::endl;

		}

CUDA stores function parameters in shared RAM. Maybe the compiler’s being overly smart and storing the entire fitnessVector into shared RAM, which is not global. Try using float *fitnessVector instead.

thanks for reply,

I tried float* fitnessVector instead of float fitnessVector[max_population] as a parameter. But the result is same. I am trying to solve this point maybe for three days but i coulnt find any solution yet. Another interesting point is that, if I assign constant values (such as 1.0 or 0.0) to fitnessVector array in kernel, i can see those values in output file after copying to host. But with my code it does not change and i am sure that it executes assignment operation in if statement cause in debug mode i can see it.

Any idea?

Thanks for your time.

This is a long shot, but if you are willing to indulge me, try allocating hostFitness using malloc() rather than the C++ new operator and see what happens.

A general debugging technique is to check the state and if a value is incorrect, recursively check the state when the incorrect value is computed.

Since you probably don’t have access to the CUDA NEXUS hardware debugger to trace values (why does it require 2 machines?), I would comment out more and more code until you get the expected results and then isolate the bug.