Hi all,
I have one system which has 2 socket and each socket has 2 numa nodes. This system has 1 gpu for each socket. In total it’s 4 NUMA nodes and 2 GPUs (k40).
My question is that while I am using a pinned memory I cannot observe any slowdown when I want to access 2nd GPU. May it related with pinned memory?
I executed my application by using CPU on 2nd socket which has K40m(1). But I also access K40(m) which is located in first socket.
Some lines from nvprof,
Start Duration Grid Size Block Size Regs* SSMem* DSMem* Size Throughput Device Context Stream Name
6.71694s 203.04us - - - - - 2.0972MB 10.329GB/s Tesla K40m (1) 1 57 [CUDA memcpy DtoH]
6.71697s 201.57us - - - - - 2.0972MB 10.404GB/s Tesla K40m (0) 2 21 [CUDA memcpy DtoH]
6.71715s 201.47us - - - - - 2.0972MB 10.409GB/s Tesla K40m (1) 1 57 [CUDA memcpy DtoH]
6.71717s 201.44us - - - - - 2.0972MB 10.411GB/s Tesla K40m (0) 2 21 [CUDA memcpy DtoH]
6.71735s 201.44us - - - - - 2.0972MB 10.411GB/s Tesla K40m (1) 1 57 [CUDA memcpy DtoH]
6.71756s 201.41us - - - - - 2.0972MB 10.413GB/s Tesla K40m (1) 1 57 [CUDA memcpy DtoH]
6.71776s 201.41us - - - - - 2.0972MB 10.413GB/s Tesla K40m (1) 1 57 [CUDA memcpy DtoH]
Thanks in advance