Unified Memory troubles

Hi everyone,
I was trying to implement unified memory into my project and faced pretty strange problem.
So, here is the code:


struct Participant
{
double* fitness;
float* genome;
};

int main()
{

   Participant *population_1;

   cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
     
   float *temp_genome;
   double *temp_fitness;

   cudaMallocManaged(&temp_genome, genomeSize* sizeof(float));
   cudaMallocManaged(&temp_fitness, sizeof(double));
   population_1[i].genome = temp_genome; //<---------------here an error occurs
   population_1[i].fitness = temp_fitness;

   return 0;

};


So when i’m trying to degug it on my local machine (cuda 10.2, windows 7, GeForce 1050 ti) I’m getting a runtime error:

“Unhandled exception at address 0x000000013FE3A7F6 in DE_parallel_unif_memory.exe: 0xC0000006: error on page while writing to address 0x0000000502A20008 (status code 0xC0000022)”

and debugger highlights the line I’ve shown above.
When I’m doing the same thing, but on a remote server (CUDA 9.1, Linux RedHat), everything works correctly without any issues.

What am I actually doing wrong on my local machine?(command line options for nvcc are pretty the same)

P.S. populationSize and genomeSize are initiated before

Should the code actually be like this? Otherwise where does ‘i’ come from?

  typedef struct {
    double* fitness;
    float* genome;
  } Participant;  

  Participant *population_1;

  cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
   
  for ( int i = 0; i < populationSize; ++i) {
    float *temp_genome = nullptr; // I'm a 'fan' of initializing pointers
    double *temp_fitness = nullptr;
    cudaMallocManaged(&temp_genome, genomeSize* sizeof(float));
    cudaMallocManaged(&temp_fitness, sizeof(double));
    population_1[i].genome = temp_genome; //<---------------here an error occurs
    population_1[i].fitness = temp_fitness;
 }

If my for-loop assumption is correct then maybe the cudaMallocManaged() is failing for temp_genome?

What if you tried passing the &popution_1[i] to the inner cudaMallocManaged():

  typedef struct {
    double* fitness; 
    float* genome;
  } Participant; 

  Participant *population_1;

  cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
   
  for ( int i = 0; i < populationSize; ++i) {
    cudaMallocManaged( (void**)&(population_1[i].genome), genomeSize* sizeof(float));
    cudaMallocManaged( (void**)&(population_1[i].fitness), sizeof(double));
 }

By the way, if fitness is a single double, then why does the Participant struct need a pointer for it? Why not have it just have the double variable? Then you wouldn’t need the last call to cudaMallocManaged:

  typedef struct {
    double fitness; 
    float* genome;
  } Participant; 

  Participant *population_1;

  cudaMallocManaged(&population_1, populationSize * sizeof(Participant));
   
  for ( int i = 0; i < populationSize; ++i) {
    cudaMallocManaged( (void**)&(population_1[i].genome), genomeSize* sizeof(float));
  }