[HPC-Benchmarks 21.4] libnuma error

What I did was replacing the numaclt command from 21.4 with ones from 20.4.

[Before]

info "host=$(hostname) rank=${RANK} lrank=${LOCAL_RANK} cores=${CPU_CORES_PER_RANK} gpu=${GPU} cpu=${CPU} mem=${MEM} net=${UCX_NET_DEVICES} bin=$XHPL"

numactl --physcpubind=${CPU} ${MEMBIND} ${XHPL} ${DAT}

[After]

if [ -z "${MEM}" ]; then
  info "host=$(hostname) rank=${RANK} lrank=${LOCAL_RANK} cores=${CPU_CORES_PER_RANK} gpu=${GPU} cpu=${CPU} ucx=${UCX_NET_DEVICES} bin=$XHPL"

  numactl --cpunodebind=${CPU} ${XHPL} ${DAT}
else
  info "host=$(hostname) rank=${RANK} lrank=${LOCAL_RANK} cores=${CPU_CORES_PER_RANK} gpu=${GPU} cpu=${CPU} mem=${MEM} ucx=${UCX_NET_DEVICES} bin=$XHPL"

  numactl --physcpubind=${CPU} --membind=${MEM} ${XHPL} ${DAT}

Please dump the content on the 21.4 image to a folder and modify the hpl.sh script then finally rebuild it.
I guess there might be an issue with memory binding. But I didn’t look to deep.
I hope it helps.

2 Likes