Hello,
I have a small piece of code that calls an nvblas routine from a loop. I would like to offload the computation of the (outer) loop on the device, because it is computing stuff; among other things, it is calling BLAS functions. I cannot just run the loops on the host and call nvblas routines: beside the calls to BLAS routines, my loops are computing other things.
So I have put together this tiny example:
#pragma omp target data map( tofrom: C[0:M*N*K*L]) map( to: A[0:M*N*K*L], B[0:M*N*K*L] )
#pragma omp target teams distribute parallel for
for( int i = 0 ; i < M ; i++ ){
for( int j = 0 ; j < N ; j++ ){
for( int k = 0 ; k < K ; k++ ){
for( int l = 0 ; l < L ; l++ ){
C[i*N+j*K+k*L+l] = A[i*N+j*K+k*L+l] + B[i*N+j*K+k*L+l];
}
}
#pragma omp target data use_device_ptr( A, B, C )
{
dgemm( "N", "N", &N, &K, &L, &alpha, &A[i*N+j*K], &K, &B[i*N+j*K], &L, &beta, &C[i*N+j*K], &L );
}
}
}
But when I try to compile it, I get an error that seems to happen on the BLAS call:
$ nvc++ -I$OPENBLAS/include/openblas -I$CUDAROOT/include -O3 \
-o simplebench simplebench.cpp -lnvblas -L$CUDAROOT/lib64 \
-cudalib=cublas -lcublas -mp=gpu -Minfo=mp \
-L$OPENBLAS/lib64 -lopenblas
main:
43, #omp target teams distribute parallel for
43, Generating Tesla and Multicore code
Generating "nvkernel_main_F1L43_1" GPU kernel
Generating map(to:B[:L*(K*(M*N))])
Generating map(tofrom:C[:L*(K*(M*N))])
Generating map(to:A[:L*(K*(M*N))])
47, Loop parallelized across teams and threads(128), schedule(static)
55, Accelerator restriction: unsupported statement type: opcode=OMPTARGETDATA
[...]/Linux_x86_64/21.5/compilers/share/llvm/bin/opt: /tmp/nvc++HSFcJbDd-HwH.ll:1078:32: error: use of undefined value '%A.addr'
%138 = load double*, double** %A.addr, align 8, !tbaa !26, !dbg !144
^
^
I am using nvc++ 21.5:
$ nvc++ --version
nvc++ 21.5-0 LLVM 64-bit target on x86-64 Linux -tp haswell
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
I have seen other error messages reported on this forum that seem close to mine, such as this one that also shows the use of undefined value '%.F0063.addr'
error. But mine has unsupported statement type: opcode=OMPTARGETDATA
, which I haven’t seen anywhere else.
If I comment out the call to dgemm, it compiles, but I suspect that in this case the compiler might be removing the use_device_ptr
that comes before.
Thanks a lot