Std::thread and OpenMP GPU Offloading

Hey all,

I’ve been refactoring some code that involves trying to use OpenMP to offload parts of a larger function to an NVIDIA A100. Problem is, the section that I’m trying to offload is part of a larger function that is being threaded via std::thread’s in C++.

Specifically, each std::thread starts a function and within this function parts of it is being offloaded to the GPU via OpenMP. The OpenMP clause is typical e.g. “#pragma omp target teams distribute parallel for”…

This seems to be causing the following runtime error:
> libgomp: cuLaunchKernel error: invalid resource handle

If I get rid of any concurrency (remove any std::thread-ing) and keep the OpenMP offloading, it seems to run fine.

Any ideas of what might be causing this? I guess I’m unsure about the thread-safety of OpenMP GPU offloading.

Hi py.aero,

I don’t use “std::thread” myself so may not be the best to answer this, but it’s my understanding that in general its not a good idea to mix std::threads with OpenMP. Though this may only apply when using OpenMP multicore CPU. I’m not sure about mixing with target offload.

Though I’m wondering why the error is coming from GNU’s libgomp? Are you using g++?

If you’re using nvc++, I’m not clear on why libgomp would be used.

What is your link line? Are you explicitly adding “-lgomp”?
What is the output from running the “ldd” command on your executable?

If you can, please provide a minimal reproducing example. If I can reproduce the error, then it will be much easier for me to investigate.

-Mat

Hey Mat,

Thanks for the reply! I actually solved this thanks to the following NVIDIA Blog Post:

I’m using g++. I was trying OpenMP’s omp_set_default_device(), but it doesn’t seem to function the same as cudaSetDevice() mentioned in the above post.