Strange behavior of CUBLAS and CUSPARSE with OpenACC

Hello.

Long time no see.

Recently I have gone into a very strange problem and I would like to ask you the cause.

I’m using PVF 17.7.

Following is a simple code that reproduces my problem.

PROGRAM MULTIGPU
    
    USE OMP_LIB
    USE CUDAFOR
    USE OPENACC
    USE CUBLAS
    USE CUSPARSE
    IMPLICIT NONE
    
    TYPE(cublasHandle) :: myblasHandle
    TYPE(cusparseHandle) :: mysparseHandle
    INTEGER :: tid, gid, ierr
    INTEGER :: nDevice
    
    nDevice = acc_get_num_devices(acc_device_nvidia)
    PRINT *, 'OpenACC Available Devices : ', nDevice
    
    !$OMP PARALLEL PRIVATE(tid, gid, ierr) NUM_THREADS(nDevice)
    tid = omp_get_thread_num()
    CALL acc_set_device_num(tid, acc_device_nvidia)
    gid = acc_get_device_num(acc_device_nvidia)
    PRINT *, 'Thread ', tid, ' OpenACC Device : ', gid
    ierr = cublasCreate(myblasHandle)
    ierr = cusparseCreate(mysparseHandle)
    gid = acc_get_device_num(acc_device_nvidia)
    PRINT *, 'Thread ', tid, ' OpenACC Device : ', gid
    !$OMP END PARALLEL
    
END PROGRAM

It is just a beginning phase of multi-device control setting.

The output is

OpenACC Available Devices : 2
Thread 0 OpenACC Device : 0
Thread 1 OpenACC Device : 1
Thread 0 OpenACC Device : 1
Thread 1 OpenACC Device : 1

Strange thing is that cublasCreate and cusparseCreate changes the OpenACC device arbitrarily.

What would be the problem?

Hi CNJ,

My best guess is that you have a data collision on the handle variables since they are shared. Can you try privatizing them?

Note that I wasn’t able to reproduce the error on my Linux system, but that may just be that the OMP threads happen to get run in a different order.

-Mat

Privatizing the handles does not make changes.

Is there any OpenACC thread safety issues on Windows?

Actually, it doesn’t seem to be just the issue of CUBLAS and CUSPARSE only.

Following code results in the same behavior.

PROGRAM MULTIGPU
    
    USE OMP_LIB
    USE CUDAFOR
    USE OPENACC
    USE CUBLAS
    USE CUSPARSE
    IMPLICIT NONE
    
    TYPE(cublasHandle) :: myblasHandle
    TYPE(cusparseHandle) :: mysparseHandle
    INTEGER :: tid, gid, ierr
    INTEGER :: nDevice
    
    nDevice = acc_get_num_devices(acc_device_nvidia)
    PRINT *, 'OpenACC Available Devices : ', nDevice
    
    !$OMP PARALLEL PRIVATE(myblasHandle, mysparseHandle, tid, gid, ierr) NUM_THREADS(nDevice)
    tid = omp_get_thread_num()
    CALL acc_set_device_num(tid, acc_device_nvidia)
    gid = acc_get_device_num(acc_device_nvidia)
    PRINT *, 'Thread ', tid, ' OpenACC Device : ', gid
    !$ACC ENTER DATA COPYIN(myblasHandle)
    gid = acc_get_device_num(acc_device_nvidia)
    PRINT *, 'Thread ', tid, ' OpenACC Device : ', gid
    !$OMP END PARALLEL
END PROGRAM



OpenACC Available Devices : 2
Thread 0 OpenACC Device : 0
Thread 1 OpenACC Device : 1
Thread 0 OpenACC Device : 1
Thread 1 OpenACC Device : 1

My PC has 2 GeForce GTX 750 Ti cards.

FYI, I will attach the nvidia-smi output.

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Wed Mar 07 11:35:35 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 388.13 Driver Version: 388.13 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti WDDM | 00000000:01:00.0 On | N/A |
| 33% 32C P8 1W / 38W | 359MiB / 2048MiB | 1% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 750 Ti WDDM | 00000000:02:00.0 Off | N/A |
| 30% 28C P8 1W / 38W | 29MiB / 1024MiB | 0% Default |
±------------------------------±---------------------±---------------------+

And I found that my code has no problem on Linux with PGICE 17.10, as you mentioned.

You’d better do some tests with Windows version.

Hi CNJ,

I was able to replicate the error on Windows so have added a problem report (TPR#25328).

-Mat

I prefer debugging on Windows rather than Linux, because Visual Studio GUI is convenient for me.

Will there be any workaround?

Hi CNJ,

This issue should be resolved in release 18.7