I think you’re on the right track with your thought process.
A symbol not found error can occur when you have device code that incorporates a failure detectable at load time. Such failures might be a device binary that does not match the GPU you are trying to run on, for example.
A way that these errors can creep in is if you are specifying a compilation that involves compile to PTX only (or compile to PTX+SASS, but not specifying the correct SASS architecture for your GPU). Either of these approaches can involve a JIT-compile at runtime/load-time. This JIT compile can fail (e.g. hitting a machine limit) and as a result you have no binary for your GPU, so things won’t work, and one of the side-effects is that device symbols are not loaded/visible. If your first evidence of this is a “bystander” operation that touches a device symbol, you’ll get a wierd device-symbol-not-found error. A full walk-through of such a case is here:
So I guess the first question I would have is, what is your exact compile command line, and what GPU are you actually running the code on when you witness the CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND error ?
If your case involves a JIT-compile, and if the JIT-compile is actually failing in your case, the solution to make the problem more “visible” is to force the PTX-to-SASS compile step to occur at compile time. You can do this by accurately specifying a device architecture to compile for which matches your GPU.
For example if you were compiling for a cc3.5 architecture, but running on a cc5.0 device, you might be specifying -arch=sm_35. The “fix” would be to specify -arch=sm_50 when compiling. If you do that with the code configuration that calls both f1() and f2(), you may witness a compile-time error which will be instructive.