getting error like call to cuMemcpyDtoHAsync returned error

Hello,
I have to parallelize the code given below in openACC ,but it is striking error
call to cuMemcpyDtoHAsync returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
please resolve my problem.

#pragma acc kernels
for (i = 0; i < dim; ++i)
{
for (j = 0; j < dim; ++j)
{
if (i >= j) continue;

route_new = twoOptSwap(route, dim, i, j);
cur_dist = getDist(route_new, dist, dim);

if (cur_dist < best_dist)
{
best_dist = cur_dist;
free(route_best);
route_best = route_new;
}
else // Abandon the new route
{
free(route_new);
}
}
}

Hi pramodyelmewad,

An illegal address error occurs when you’re code is accessing a bad address on the device. It could be that you’re access a host pointer, reading off the end of the array, a null pointer, etc.

Having a reproducing example is always helpful since I can only guess as to the problem otherwise.

What does the compiler feedback messages tell you? (i.e. add “-Minfo=accel” to your compilation and post the results).

What data types are “route_new”, "route_best, and “cur_dist”? Are they shared pointers? if so, this will cause you problems.

What data structure is “route” and how do you copy it to the device?

How are the two functions declared? Do you use the OpenACC “routine” directive for these or are you inlining them?

Does “twoOptSwap” try to allocate memory on device?

  • Mat

than you mkcolg

Here with i am including your needful data to your questions.
What does the compiler feedback messages tell you?
Ans:
Here is the compiler feedback
main:
290, Accelerator kernel generated
Generating Tesla code
290, #pragma acc loop gang /* blockIdx.x /
292, #pragma acc loop vector(128) /
threadIdx.x */
290, Generating copyout(dist[:dim][:dim])
Generating copyin(city[:dim][:2])
292, Loop is parallelizable
Metaheuristic::getDist(int *, float **, int):
19, Generating acc routine seq
Metaheuristic::twoOptSwap(int *, int, int, int):
77, Generating acc routine seq
82, Loop is parallelizable
87, Loop is parallelizable
92, Loop is parallelizable
Metaheuristic::bestChild(int *, float *, int):
114, Scalar last value needed after loop for best_dist,route_best at line 141
Accelerator restriction: scalar variable live-out from loop: route_best,best_dist
Accelerator kernel generated
Generating Tesla code
114, #pragma acc loop gang /
blockIdx.x /
116, Complex loop carried dependence of route_new-> prevents parallelization
Loop carried scalar dependence for best_dist at line 123
Scalar last value needed after loop for best_dist at line 141
Loop carried scalar dependence for route_best at line 126
Scalar last value needed after loop for route_best at line 139
Complex loop carried dependence of route->,-> prevents parallelization
Accelerator restriction: scalar variable live-out from loop: route_best,best_dist
Parallelization would require privatization of array [:
]
82, Complex loop carried dependence of route->,-> prevents parallelization
87, Complex loop carried dependence of route->,-> prevents parallelization
Parallelization requires privatization of -> as well as last value
92, Complex loop carried dependence of route->,-> prevents parallelization
Parallelization requires privatization of -> as well as last value
26, Accelerator restriction: induction variable live-out from loop: …inline
27, Accelerator restriction: induction variable live-out from loop: …inline

What data types are “route_new”, "route_best, and “cur_dist”? Are they shared pointers? if so, this will cause you problems

Ans:
the data types as you mentioned about above variable are :
int *route_new;
int *route_best;
float cur_dist;
These pointers should be shared.

What data structure is “route” and how do you copy it to the device?

Ans:
int *route;

it is the pointer to hold the current route of traveling salesman problem and to do copy, I suggest leave it over kernels directive.

How are the two functions declared? Do you use the OpenACC “routine” directive for these or are you inlining them?
Ans:
the two fuctions are declared as :

#pragma acc routine seq
float getDist(int *seq_city, float **dist, int dim)
{
int i;
float total_dist = 0;

//#pragma acc parallel loop pcopyin(dist[:dim][:dim],seq_city[dim])
for (i = 0; i < dim-1; ++i)
{
total_dist += dist[seq_city_][seq_city[i+1]];
}
total_dist += dist[seq_city][seq_city[0]];

return total_dist;

}
#pragma acc routine seq
int* twoOptSwap(int *route, int dim, int m, int n)
{
int i, j;
int route_new;

route_new = (int
) malloc(sizeof(int)*dim);
for (i = 0; i <=m-1; ++i)
{
route_new = route;
}

for (i = m, j = n; i <= n; ++i, --j)
{
route_new = route[j];
}

for (i = n+1; i < dim; ++i)
{
route_new = route;
}

return route_new;

}

all the required information as you mentioned i have given, if anything you want for further clarification ,please post it._

Hi pramodyelmewad,

These pointers should be shared.

This is problem. By having the pointers shared, it introduces a race condition in your code. One thread will be setting “route_new” while another could be trying to free it. Looking at the compiler feedback, it’s correctly detecting these dependencies.

Basically, this code is not parallelizable. You’ll need to rethink your algorithm so that each thread can be working independently. Each iteration of the loops should not depend upon any other iteration.

route_new = (int*) malloc(sizeof(int)*dim);

Calling malloc from within device code will have each thread allocate their own private data. It’s fine to do, but your algorithm is expecting this memory is global.

Most likely the “illegal address” error is a combination of one thread allocating data and a different thread freeing this memory (since the pointer is shared).

  • Mat