Segfault when running application in profiler

I’m trying to profile my application with the visual profiler, however the profiler returns:

Error: Application received signal 139

Which corresponds to a segmentation fault. When I run cuda-memcheck I get internal error 20, segmentation fault in the host code. The application runs fine both normally and in cuda-gdb. However when I enable cuda-memcheck in cuda-gdb I get a segmentation fault in cuInit(0). Here is the backtrace:

#0  0x000eeaec in cuGetExportTable ()
#1  0xf6824534 in cudbgGetAPIVersion ()
   from /usr/lib/arm-linux-gnueabihf/libcuda.so
#2  0xf6721b08 in cuMemGetAttribute_v2 ()
   from /usr/lib/arm-linux-gnueabihf/libcuda.so
#3  0xf66476d8 in cuInit () from /usr/lib/arm-linux-gnueabihf/libcuda.so

This problem occurs on multiple devices.

This is apparently caused by using the CUDA Extension Wrangler ([url]https://github.com/CudaWrangler/cuew[/url]). When I import CUDA through cuda.h everything is working as intended.

I think I am in a similar situation. I get the same error message when running in the visual profiler but the application runs normally in both release and debug mode outside the visual profiler.

It seems that you have found a solution since you wrote “When I import CUDA through cuda.h …”. I interpreted this as a suggestion to add

#include “cuda.h”

to my .cu file. This however did not change the behavior. Could you, please, elaborate on your solution if I am misinterpreting it. Your help would be much appreciated.