Std::thread and OpenMP GPU Offloading

py.aero · December 22, 2022, 1:34am

Hey all,

I’ve been refactoring some code that involves trying to use OpenMP to offload parts of a larger function to an NVIDIA A100. Problem is, the section that I’m trying to offload is part of a larger function that is being threaded via std::thread’s in C++.

Specifically, each std::thread starts a function and within this function parts of it is being offloaded to the GPU via OpenMP. The OpenMP clause is typical e.g. “#pragma omp target teams distribute parallel for”…

This seems to be causing the following runtime error:
> libgomp: cuLaunchKernel error: invalid resource handle

If I get rid of any concurrency (remove any std::thread-ing) and keep the OpenMP offloading, it seems to run fine.

Any ideas of what might be causing this? I guess I’m unsure about the thread-safety of OpenMP GPU offloading.

MatColgrove · December 22, 2022, 4:45pm

Hi py.aero,

I don’t use “std::thread” myself so may not be the best to answer this, but it’s my understanding that in general its not a good idea to mix std::threads with OpenMP. Though this may only apply when using OpenMP multicore CPU. I’m not sure about mixing with target offload.

Though I’m wondering why the error is coming from GNU’s libgomp? Are you using g++?

If you’re using nvc++, I’m not clear on why libgomp would be used.

What is your link line? Are you explicitly adding “-lgomp”?
What is the output from running the “ldd” command on your executable?

If you can, please provide a minimal reproducing example. If I can reproduce the error, then it will be much easier for me to investigate.

-Mat

py.aero · December 23, 2022, 1:32am

Hey Mat,

Thanks for the reply! I actually solved this thanks to the following NVIDIA Blog Post:

I’m using g++. I was trying OpenMP’s omp_set_default_device(), but it doesn’t seem to function the same as cudaSetDevice() mentioned in the above post.

Topic		Replies	Views
"invalid context" when mixing OpenMP, OpenAcc Legacy PGI Compilers	2	3229	January 31, 2014
Using OpenMP section with cublas CUDA Programming and Performance	6	9626	February 11, 2010
MultiGPU, multithread, and establishing contexts Odd (but good) behavior with OpenMP affecting multi CUDA Programming and Performance	4	6239	July 10, 2009
Combining OpenMP and OpenACC Legacy PGI Compilers	4	6166	November 14, 2017
OpenMP doesn't work in a templated function CUDA Programming and Performance	4	2242	September 14, 2009
CUDA_ERROR_ILLEGAL_ADDRESS with OpenMP "distribute parallel for" nvc, nvc++ and nvfortran	2	225	May 15, 2024
Multiple GPUs error cuLaunchKernel 400 Legacy PGI Compilers	8	15997	June 6, 2017
OpenMP offloading to GPU: call to cuMemcpyDtoHAsync returned error 700: nvc, nvc++ and nvfortran	3	287	May 6, 2024
OpenMP, OpenACC and acc_set_device_num Legacy PGI Compilers	12	10761	March 15, 2013
Problems on OpenMP and multi-GPU Legacy PGI Compilers	5	4508	August 15, 2012

Std::thread and OpenMP GPU Offloading

Related topics