Hello guys, I need some help. However I try to use OpenACC directives in my code, I can compile it, but I can’t get it to work at all. My code plots Mandelbrot set using SFML graphics, and I am trying to do all of my calculations on gpu using OpenACC, put them in array of rgb colors and then plot the image using that array. I think I managed to construct the array so that each iteration of the loop that calculates rgb color writes that color in it’s own elements in array (for each pixel color there are 4 elements in imageColor array → r,g,b,alpha). Code works as expected when I compile it using -ta=host. All my code is located at: GitHub - uros97/mandelbrot: Mandelbrot project. File with code is mandelbrot.cpp. When I compile it using -ta=tesla:managed it gives following compiler output :
mandelbrot(double, double):
42, Generating implicit acc routine seq
Generating acc routine seq
Generating Tesla code
updateImageSlice(double, double, double, int, int):
94, Generating copy(imageColors[:2400000],colors[:384]) [if not already present]
Generating Tesla code
95, #pragma acc loop gang /* blockIdx.x /
99, #pragma acc loop vector(128) / threadIdx.x */
105, Generating implicit reduction(+:imag)
107, Generating implicit reduction(+:real)
46, Loop carried scalar dependence for …inline at line 46,52,47
53, Accelerator restriction: induction variable live-out from loop: …inline
54, Accelerator restriction: induction variable live-out from loop: …inline
99, Loop is parallelizable
I realize that output for scalar dependence and accelerator restrictions is because loop in mandelbrot function does not have independent iterations, so when I put #pragma acc loop seq for that part, compiler gives following output without those warnings for dependence and out of loop variable:
mandelbrot(double, double):
42, Generating implicit acc routine seq
Generating acc routine seq
Generating Tesla code
updateImageSlice(double, double, double, int, int):
94, Generating copy(imageColors[:2400000],colors[:384]) [if not already present]
Generating Tesla code
95, #pragma acc loop gang /* blockIdx.x /
99, #pragma acc loop vector(128) / threadIdx.x */
105, Generating implicit reduction(+:imag)
107, Generating implicit reduction(+:real)
99, Loop is parallelizable
However, in both cases, when I run the compiled programe, I get the following error:
Failing in Thread:1
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
From the research I did, I can assume that the problem has something to do with pointers pointing to host memory instead of gpu, however, I tried various different approaches to copying and deleting data to and from GPU, but every time I manage to compile, but not run the programe. Also I left some commented pragmas of what I tried to do, I tried many different copies with various loops, so some are left commented. Any help or advice is appreciated, and thank you in advance.