open ACC: call to cuStreamCreate returned error 1

rz5q2008 · January 8, 2016, 10:32pm

Hello,
I am trying to parallelize my fortran code with open ACC, however, when running it it generates the error:
call to cuStreamCreate returned error 1: Invalid Value

can anybody help me figure out what cause this problem? is this hardware related?

thanks in advance!

MatColgrove · January 8, 2016, 10:39pm

Hi rz5q2008,

I’ve seen this once before when the user had a very old CUDA driver installed.

What CUDA driver version do you have? (shown in the output from “pgaccelinfo”)

Can you try updating it? CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer

Mat

rz5q2008 · January 9, 2016, 9:48pm

Thanks Matt for your quick response. I will try install the new cuda driver and then let you if it works.
BTW, my application is very large, and I previously used Intel fortran compiler, and in the past few days I benchmark it with the PGI fortran, and found the Intel version is 20% to 30% faster than the PGI compiled version (I used /fast, and /fast /o3.
Do you have any suggestion if I can get it faster through using some options inPGI fortran?

thanks again!

rz5q2008 · January 10, 2016, 2:55pm

Hi Matt,
I installed the new Cuda driver following the link you provided, and the program now runs, though using the GPU make the program run extremely slow. will check the code, and might ask for your advice later.
thanks!

MatColgrove · January 11, 2016, 6:50pm

Do you have any suggestion if I can get it faster through using some options inPGI fortran?

Not knowing anything about your program, I’d recommend “-fast -Mfprelaxed -Mipa=fast,inline”. Intel uses relaxed precision by default at higher optimizations but you need to explicitly add it for PGI (we’re a bit more conservative regarding accuracy). IPA may or may not help, but worth a try.

Other options to try are (please refer to PGI docs for more detail about each optimization)

Vectoriztion sub-options: Try partial vectorization (-Mvect=partial), 256-SIMD if you’re on a hawsell or piledriver architecture (-Mvect=simd:256), and removing altcode generation (-Mvect=noaltcode).

Unrolling factors: -Munrol=n: to control the loop unroll factor.

Inlining: review the compiler feedback messages from “-Minfo” and see if any routines are not getting inlined. You can try using the IPA inline suboptions to get more routines to inlines such as “-Mipa=inline:reshape” if you’re passing in sub-arrays or “-Mipa=inline:levels:10” to increase the number call levels to inline (at the cost of code size).

Beyond this, I’d profile your code, discover the hotspots, then determine what could be preventing optimization (the -Minfo option helps here).

Mat

rz5q2008 · January 12, 2016, 3:52am

thanks Matt. will try and let you how it goes.
best

Topic		Replies	Views
Error:call to cuStreamCreate returned error 1: Invalid value Legacy PGI Compilers	3	3584	June 24, 2014
Problem:Fortran code with open ACC doesn't gain any speed up Legacy PGI Compilers	8	6692	February 12, 2014
error for a simple OPENACC program Legacy PGI Compilers	23	11891	May 16, 2013
Starting Accel. Fortran Legacy PGI Compilers	2	3642	February 17, 2011
Accelerator Fatal Error: No NVIDIA/CUDA version... Legacy PGI Compilers	12	14693	May 15, 2017
pgfortran works for cuda but not OpenACC Legacy PGI Compilers	7	6738	January 13, 2016
Call to cuModuleLoadData returned error 209 :No binary for GPU Legacy PGI Compilers	2	1075	May 20, 2023
using cuda libraries with OpenACC Legacy PGI Compilers	1	6047	July 13, 2012
have difficulty installing and using open ACC. Legacy PGI Compilers	5	7221	February 5, 2014
Mapping the kernel arguments for PGI generated OpenACC cuda code Legacy PGI Compilers cuda	1	558	November 2, 2020

open ACC: call to cuStreamCreate returned error 1

Related topics