Unified Memory troubles

nick.zorander · May 6, 2020, 4:29pm

Hi everyone,
I was trying to implement unified memory into my project and faced pretty strange problem.
So, here is the code:

struct Participant
{
double* fitness;
float* genome;
};

int main()
{

   Participant *population_1;

   cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
     
   float *temp_genome;
   double *temp_fitness;

   cudaMallocManaged(&temp_genome, genomeSize* sizeof(float));
   cudaMallocManaged(&temp_fitness, sizeof(double));
   population_1[i].genome = temp_genome; //<---------------here an error occurs
   population_1[i].fitness = temp_fitness;

   return 0;

};

So when i’m trying to degug it on my local machine (cuda 10.2, windows 7, GeForce 1050 ti) I’m getting a runtime error:

“Unhandled exception at address 0x000000013FE3A7F6 in DE_parallel_unif_memory.exe: 0xC0000006: error on page while writing to address 0x0000000502A20008 (status code 0xC0000022)”

and debugger highlights the line I’ve shown above.
When I’m doing the same thing, but on a remote server (CUDA 9.1, Linux RedHat), everything works correctly without any issues.

What am I actually doing wrong on my local machine?(command line options for nvcc are pretty the same)

nick.zorander · May 7, 2020, 8:42pm

P.S. populationSize and genomeSize are initiated before

hazelnutvt04 · June 5, 2020, 3:23pm

Should the code actually be like this? Otherwise where does ‘i’ come from?

  typedef struct {
    double* fitness;
    float* genome;
  } Participant;  

  Participant *population_1;

  cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
   
  for ( int i = 0; i < populationSize; ++i) {
    float *temp_genome = nullptr; // I'm a 'fan' of initializing pointers
    double *temp_fitness = nullptr;
    cudaMallocManaged(&temp_genome, genomeSize* sizeof(float));
    cudaMallocManaged(&temp_fitness, sizeof(double));
    population_1[i].genome = temp_genome; //<---------------here an error occurs
    population_1[i].fitness = temp_fitness;
 }

If my for-loop assumption is correct then maybe the cudaMallocManaged() is failing for temp_genome?

What if you tried passing the &popution_1[i] to the inner cudaMallocManaged():

  typedef struct {
    double* fitness; 
    float* genome;
  } Participant; 

  Participant *population_1;

  cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
   
  for ( int i = 0; i < populationSize; ++i) {
    cudaMallocManaged( (void**)&(population_1[i].genome), genomeSize* sizeof(float));
    cudaMallocManaged( (void**)&(population_1[i].fitness), sizeof(double));
 }

By the way, if fitness is a single double, then why does the Participant struct need a pointer for it? Why not have it just have the double variable? Then you wouldn’t need the last call to cudaMallocManaged:

  typedef struct {
    double fitness; 
    float* genome;
  } Participant; 

  Participant *population_1;

  cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
   
  for ( int i = 0; i < populationSize; ++i) {
    cudaMallocManaged( (void**)&(population_1[i].genome), genomeSize* sizeof(float));
  }