Hello.
I’m trying to use multiple GPUs on a linux machine which has 2 GTX Titan Z cards (4 physical GPU chips).
I use 4 OpenMP thread so that I can associate each CPU thread with one GPU chip.
But I get some errors which I cannot understand.
Failing in Thread:4
Failing in Thread:2
Failing in Thread:3
call to cuDevicePrimaryCtxRetain returned error 709: Context is destroyed or not yet created
call to cuDevicePrimaryCtxRetain returned error 709: Context is destroyed or not yet created
call to cuDevicePrimaryCtxRetain returned error 709: Context is destroyed or not yet created
Failing in Thread:4
call to cuDevicePrimaryCtxRelease returned error 4: Deinitialized
Failing in Thread:1
call to cuModuleLoadData returned error 300: Invalid Source
Error: _mp_pcpu_reset: lost thread
Following is the code section where error occurs.
CALL OMP_SET_NUM_THREADS(PE%nCUDADevice)
!$OMP PARALLEL PRIVATE(tid, ierr, cuProperty, PinCount, PinBeg, PinEnd, maxFsr)
tid = OMP_GET_THREAD_NUM(); ierr = cudaSetDevice(tid)
CALL ACC_SET_DEVICE_NUM(tid, acc_device_nvidia)
ierr = cudaGetDeviceProperties(cuProperty, tid)
cuDevice(tid)%cuSMXCount = cuProperty%multiProcessorCount
cuDevice(tid)%cuArchitecture = cuProperty%major
cuDevice(tid)%cuWarpSize = cuProperty%warpSize
cuDevice(tid)%cuMaxThreadPerSMX = cuProperty%maxThreadsPerMultiprocessor
cuDevice(tid)%cuMaxThreadPerBlock = cuProperty%maxThreadsPerBlock
cuDevice(tid)%cuMaxWarpPerSMX = cuProperty%maxThreadsPerMultiprocessor / cuProperty%warpSize
SELECT CASE (cuDevice(tid)%cuArchitecture)
CASE (2) !--- Fermi
cuDevice(tid)%cuMaxBlockPerSMX = 8
CASE (3) !--- Kepler
cuDevice(tid)%cuMaxBlockPerSMX = 16
CASE (5) !--- Maxwell
cuDevice(tid)%cuMaxBlockPerSMX = 32
CASE (6) !--- Pascal
cuDevice(tid)%cuMaxBlockPerSMX = 32
END SELECT
cuDevice(tid)%cuWarpPerBlock = cuDevice(tid)%cuMaxWarpPerSMX / cuDevice(tid)%cuMaxBlockPerSMX
cuDevice(tid)%cuThreadPerBlock = cuDevice(tid)%cuWarpPerBlock * cuDevice(tid)%cuWarpSize
IF (cuDevice(tid)%lFullWarp) THEN
cuDevice(tid)%sharedMemoryDim = cuDevice(tid)%cuThreadPerBlock
ELSE
cuDevice(tid)%sharedMemoryDim = 2 * ng
ENDIF
!$ACC ENTER DATA COPYIN(cuDevice(tid))
!$ACC ENTER DATA COPYIN(cuDevice(tid)%FsrBeg, cuDevice(tid)%FsrEnd)
!$ACC ENTER DATA COPYIN(cuDevice(tid)%PinBeg, cuDevice(tid)%PinEnd)
!$ACC ENTER DATA COPYIN(cuDevice(tid)%DcmpRayList, cuDevice(tid)%DcmpRayCount)
!$OMP END PARALLEL
Can you point out what I’m doing wrong here?
Maybe calling cudaSetDevice and ACC_SET_DEVICE_NUM together has some potential risks?