Strange CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND error from cudaMallocManaged

JZM · January 16, 2016, 11:59am

Hi Folks,

I’m a CUDA newbie porting some existing math-heavy code to CUDA, and I’ve hit an intractable problem - I’ve ported a fair bit of code just fine, but have now hit a problem where cudaMallocManaged is failing with CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND. The strange thing is that whether this error appears or not, depends entirely on what the device procedure contains.

If the device code contains a call to either f1() or f2() then everything is fine, but as soon as it contains a call to both I hit the CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND error.

I’m assuming that cudaMallocManaged is a bystander here - it’s the first CUDA runtime called so I assume this is the point at which the device meta-code gets translated for the actual hardware? Could I be hitting some kind of size limit here?

Many thanks!

Robert_Crovella · January 16, 2016, 4:01pm

I think you’re on the right track with your thought process.

A symbol not found error can occur when you have device code that incorporates a failure detectable at load time. Such failures might be a device binary that does not match the GPU you are trying to run on, for example.

A way that these errors can creep in is if you are specifying a compilation that involves compile to PTX only (or compile to PTX+SASS, but not specifying the correct SASS architecture for your GPU). Either of these approaches can involve a JIT-compile at runtime/load-time. This JIT compile can fail (e.g. hitting a machine limit) and as a result you have no binary for your GPU, so things won’t work, and one of the side-effects is that device symbols are not loaded/visible. If your first evidence of this is a “bystander” operation that touches a device symbol, you’ll get a wierd device-symbol-not-found error. A full walk-through of such a case is here:

[url]nvcc - CUDA invalid device symbol error - Stack Overflow

So I guess the first question I would have is, what is your exact compile command line, and what GPU are you actually running the code on when you witness the CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND error ?

If your case involves a JIT-compile, and if the JIT-compile is actually failing in your case, the solution to make the problem more “visible” is to force the PTX-to-SASS compile step to occur at compile time. You can do this by accurately specifying a device architecture to compile for which matches your GPU.

For example if you were compiling for a cc3.5 architecture, but running on a cc5.0 device, you might be specifying -arch=sm_35. The “fix” would be to specify -arch=sm_50 when compiling. If you do that with the code configuration that calls both f1() and f2(), you may witness a compile-time error which will be instructive.

JZM · January 16, 2016, 6:45pm

Specifying -arch made no difference: I see no new compiler messages but the same runtime error. However, simply defining NDEBUG=1 so that asserts became no-ops fixed the issue for now.

njuffa · January 16, 2016, 6:58pm

Are you using assert() calls in device code, by any chance? Best I know, the standard function assert() is not supported in device code, so it makes sense that its use would trigger a JIT compilation error.

If the assert() instance in question are in host code, on the other hand, use of NDEBUG=1 will cause the assertions to be inactivated, which means the program is now potentially ignoring real errors.

Robert_Crovella · January 16, 2016, 7:37pm

are you on MacOS?

assert is supported in device code but not on MacOS:

[url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#assertion[/url]

“It is not supported on MacOS, regardless of the device, and loading a module that references the assert function on Mac OS will fail.”

So if you are on MacOS and you are using assert in device code, that could be an explanation for the module load failure leading to the symbol reference failure. (However, I can’t connect that with the f1()/f2() data point.)

njuffa · January 16, 2016, 7:44pm

@txbob: Thanks for the correction, I apparently missed the fact that support for device-side assert() had been added to CUDA with exception of the OS X platform.

JZM · January 17, 2016, 9:06am

Not on the Mac no, on Windows with vc12 host compiler.

The code in question is existing generic code (Boost.Math library) which I’m investigating porting so it can be used on the device as well as the host. The asserts are macro-ised, so longer term I can search and replace them with something that always evaluates to a no-op on the device only. Strangely it’s only when the assert is mixed with “something else” (and I’m not sure what that is yet) that the problem manifests.

Topic		Replies	Views
bug in CUDA initialization? simple code cant see the device after xxx runs CUDA Programming and Performance	10	7778	June 23, 2009
No CUDA device after rebooting... CUDA Programming and Performance	7	11001	December 7, 2011
good old "-lcudart not found" can't build examples CUDA Programming and Performance	13	5748	November 25, 2010
unresolved external symbol _main referenced in function ___tmainCRTStartup CUDA Programming and Performance	7	9320	February 22, 2011
app on ATI card ends at cudaMalloc() CUDA Programming and Performance	6	5554	December 29, 2008
cudaOccupancyMaxPotentialBlockSize - invalidDeviceFunction Error in CUDA 10 CUDA Programming and Performance	8	2446	January 2, 2019
malloc isn't found when used in a header file CUDA Programming and Performance	9	5107	December 19, 2010
CUDA separable compilation + shared libraries -> "Invalid function" error CUDA NVCC Compiler	7	2789	July 14, 2022
Can we do malloc inside a __global__ function CUDA Programming and Performance	26	9729	February 21, 2010
Compiling SDK on opensuse CUDA Programming and Performance	12	14142	August 21, 2009

Strange CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND error from cudaMallocManaged

Related topics