Error:call to cuStreamCreate returned error 1: Invalid value

Hi all,
I installed pgi compiler on my machine (centos, core version:2.6.32.12-0.7-default ) with Nvidia Tesla c2050 and CUDA driver version 4010. Compiler version is 14.6-0 64-bit.
pgaccelinfo suggested “-ta=tesla:cc20” flag for accelerator compilation.
And when I compiled my program with this command (), compilation went fine. But when I execute the executable, I got this runtime error: call to cuStreamCreate returned error 1: Invalid value
Does anyone have similar experience?

Hi Jing Li,

This is a new one so I don’t know what would cause this. Are you using “async” or “acc_set_cuda_stream”? If so, what value are you giving it?

If not, can you post or send a reproducing example to PGI Customer Service (trs@pgroup.com)?

  • Mat

Hi Mat,
I did not use “async” or “acc_set_cuda_stream”. And I have tested this code on other cards (NVIDIA GTX 690; GTX TITAN), it works fine on them. So I doubt if it is caused by the code I use?
And this is the code I use, a simple gaussian blur function. Compilation works just fine, but the binary generated can not work.

     
#pragma acc data present(original_image[:cols][:rows],blur_image_gpu[:cols][:rows])
      {
         int iy, ix;                                                                                                                                                       
#pragma acc kernels
#pragma acc loop independent
         for (iy = 0; iy < rows; iy++)
         {   
            blur_image_gpu[iy * cols] = 0;
            blur_image_gpu[iy * cols + 1] = 0;
            blur_image_gpu[((iy+1) * cols) -1] = 0;
            blur_image_gpu[((iy+1) * cols) -2] = 0;
         }   

#pragma acc kernels
#pragma acc loop independent
         for (ix = 2; ix < cols-2; ix++)
         {   
            blur_image_gpu[ix] = 0;
            blur_image_gpu[ix + cols] = 0;
            blur_image_gpu[ix + ((rows-1) * cols)] = 0;
            blur_image_gpu[ix + ((rows - 2) * cols)] = 0;
         }   


#pragma acc kernels
#pragma acc loop independent
         for (iy = 2; iy < rows - 2; iy++)
         {   
#pragma acc loop independent
            for (ix = 2; ix < cols - 2; ix++)
            {   
               blur_image_gpu[ ix + iy * cols] = f * ( 
                     s0 * original_image[iy * cols + ix] + 
                     s1 * ( original_image[(iy) * cols + (ix - 1)] + original_image[(iy) * cols + (ix + 1)] + original_image[(iy - 1) * cols + (ix)] + original_image[(iy +
 1) * cols + (ix)]) + 
                     s2 * (original_image[(iy - 1) * cols + (ix - 1)] + original_image[(iy - 1) * cols + (ix + 1)] + original_image[(iy + 1) * cols + (ix - 1)] + original_
image[(iy + 1) * cols + (ix + 1)]) + 
                     s4 * ( original_image[(iy) * cols + (ix - 2)] + original_image[(iy) * cols + (ix + 2)] + original_image[(iy - 2) * cols + (ix)] + original_image[(iy +
 2) * cols + (ix)]) +
                     s5 * (original_image[(iy - 1) * cols + (ix - 2)] + original_image[(iy - 2) * cols + (ix - 1)] + original_image[(iy - 2) * cols + (ix + 1)] + original_
image[(iy - 1) * cols + (ix + 2)] + original_image[(iy + 1) * cols + (ix - 2)] + original_image[(iy + 2) * cols + (ix - 1)] + original_image[(iy + 2) * cols + (ix + 1)] + 
original_image[(iy + 1) * cols + (ix + 2)]) +
                     s8 * (original_image[(iy - 2) * cols + (ix - 2)] + original_image[(iy - 2) * cols + (ix + 2)] + original_image[(iy + 2) * cols + (ix - 2)] + original_
image[(iy + 2) * cols + (ix + 2)]) 
                     );
            }
         }
      }

Hi Jing Li,

I should have noticed this before, but your CUDA driver is very old. By default we use CUDA 5.5 and the CUDA 4.0 driver isn’t forward compatible to run these binaries.

Can you try updating your driver?

  • Mat