open ACC: call to cuStreamCreate returned error 1

Hello,
I am trying to parallelize my fortran code with open ACC, however, when running it it generates the error:
call to cuStreamCreate returned error 1: Invalid Value

can anybody help me figure out what cause this problem? is this hardware related?

thanks in advance!

Hi rz5q2008,

I’ve seen this once before when the user had a very old CUDA driver installed.

What CUDA driver version do you have? (shown in the output from “pgaccelinfo”)

Can you try updating it? https://developer.nvidia.com/cuda-downloads

  • Mat

Thanks Matt for your quick response. I will try install the new cuda driver and then let you if it works.
BTW, my application is very large, and I previously used Intel fortran compiler, and in the past few days I benchmark it with the PGI fortran, and found the Intel version is 20% to 30% faster than the PGI compiled version (I used /fast, and /fast /o3.
Do you have any suggestion if I can get it faster through using some options inPGI fortran?

thanks again!

Hi Matt,
I installed the new Cuda driver following the link you provided, and the program now runs, though using the GPU make the program run extremely slow. will check the code, and might ask for your advice later.
thanks!

Do you have any suggestion if I can get it faster through using some options inPGI fortran?

Not knowing anything about your program, I’d recommend “-fast -Mfprelaxed -Mipa=fast,inline”. Intel uses relaxed precision by default at higher optimizations but you need to explicitly add it for PGI (we’re a bit more conservative regarding accuracy). IPA may or may not help, but worth a try.

Other options to try are (please refer to PGI docs for more detail about each optimization)

Vectoriztion sub-options: Try partial vectorization (-Mvect=partial), 256-SIMD if you’re on a hawsell or piledriver architecture (-Mvect=simd:256), and removing altcode generation (-Mvect=noaltcode).

Unrolling factors: -Munrol=n: to control the loop unroll factor.

Inlining: review the compiler feedback messages from “-Minfo” and see if any routines are not getting inlined. You can try using the IPA inline suboptions to get more routines to inlines such as “-Mipa=inline:reshape” if you’re passing in sub-arrays or “-Mipa=inline:levels:10” to increase the number call levels to inline (at the cost of code size).

Beyond this, I’d profile your code, discover the hotspots, then determine what could be preventing optimization (the -Minfo option helps here).

  • Mat

thanks Matt. will try and let you how it goes.
best