Hi Programmers,
I refer to the “Example, Multi-threaded BLAS Test” section in “Fortran CUDA Library Interfaces”.
I have Tesla P100 in my workstation and I would like to use Managed Memory instead of Device Memory.
I have increased the size of the matrix from 10k to 100k.
I have changed “, device” to “, managed” and recompiled using PGI 19.9.
When I used 44 threads, I received the following error message:
CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=44 ./a.out
Running with 44 threads, each section = 2272
0: copyin Memcpy (dev=0x7e4914000000, host=0x604b60, size=40000000000) FAILED: 700(an illegal memory access was encountered)
0: copyin Memcpy (dev=0x7d8a6e000000, host=0x1748758b60, size=908800000) FAILED: 700(an illegal memory access was encountered)
0: copyin Memcpy (dev=0x7e650a000000, host=0x604b60, size=40000000000) FAILED: 700(an illegal memory access was encountered)
0: copyin Memcpy (dev=0x7d92be000000, host=0x10eea29b60, size=908800000) FAILED: 700(an illegal memory access was encountered)