Segmentation Fault on calling cudaMalloc - I can't figure out why

I’m using a Quadro 2000 and am attempting to develop a CUDA function. I get a segmentation fault in the first cudaMalloc Call. Let me give some data:

  • Setup: Setup is fine. I compiled the samples and ran a few - they ran fine.
  • Makefile: I cannibalized a Makefile out of the samples to create the makefile that I use. So I assume the right environment is discovered, the right flags are being set and the compilation is fine.
  • I have just two files at this point (a main.cpp - which was originally a c file that I renamed - it is fairly simple c code - essentially initializing a few structs, calling cuda setup functions and then calling the __global__ function
  • like I said, I get the segmentation fault right at the first cudaMalloc call.
  • x is not running. I'm on the console.

I’m completely unable to figure out what I’m doing wrong.

Let me paste some code & output here:
gpuErrChk is an inline function as follows
#define gpuErrChk(ans) {gpuAssert((ans), FILE, LINE);}
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
if (code != cudaSuccess)
fprintf(stderr,“GPUassert: %s %s %d\n”, cudaGetErrorString(code), file, line);
if (abort) exit(code);

Dbg dumps debug info…

OK. the Cuda call sequence is as follows:
gpuErrChk (cudaGetDeviceCount (&gc -> dev_count));
Dbg(“Device count %d”, gc -> dev_count);
DEBUG Device count 1

if (gc -> dev_count) { 	// set blocks & threads
	gpuErrChk (cudaGetDeviceProperties (gc -> dev_prop, i));
Dbg ("Dev prop name: %s, tot_mem: %u sharedMemPerBlock %u\nwarpSize %d maxThreadsPerBlock %d\nmem clockrate %d, mem buswidth %d l2 cache size %d,\nmaxthreads per mprocessor %d", gc -> dev_prop -> name ,
		(unsigned) gc -> dev_prop -> totalGlobalMem, (unsigned) gc -> dev_prop -> sharedMemPerBlock,
		gc -> dev_prop -> warpSize, gc -> dev_prop -> maxThreadsPerBlock,
		gc -> dev_prop -> memoryClockRate, gc -> dev_prop -> memoryBusWidth,
		gc -> dev_prop -> l2CacheSize, gc -> dev_prop -> maxThreadsPerMultiProcessor);

DEBUG Dev prop name: Quadro 2000, tot_mem: 1073414144 sharedMemPerBlock 49152
warpSize 32 maxThreadsPerBlock 1024
mem clockrate 1304000, mem buswidth 128 l2 cache size 262144,
maxthreads per mprocessor 1536

The next line is:
Dbg (“Mem asked for %u”, sizeof(record_node)*MAX_RECORDS);
DEBUG Mem asked for 19899288

The memory asked for is far lower than the available memory…
and then I call:
gpuErrChk (cudaMalloc ((void **) &(ig -> gpu_nodes), sizeof (record_node)*MAX_RECORDS));
and I get a core dump:
Segmentation fault (core dumped)

I ran this again in cuda-gdb and logged data into gdb.txt. The output (including backtrace is as follows)
[i]rinka@rinka-Desktop:~/Documents/dev/code/milib$ cat gdb.txt
Starting program: /home/rinka/Documents/dev/code/milib/milib
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/”.
[New Thread 0x7ffff47e1700 (LWP 4031)]
[New Thread 0x7ffff37e0700 (LWP 4032)]

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4c57f26 in cuVDPAUCtxCreate () from /usr/lib/x86_64-linux-gnu/
#0 0x00007ffff4c57f26 in cuVDPAUCtxCreate () from /usr/lib/x86_64-linux-gnu/
#1 0x00007ffff4c25b8f in cuMemAlloc_v2 () from /usr/lib/x86_64-linux-gnu/
#2 0x0000000000430633 in cudart::driverHelper::mallocPtr(unsigned long, void**) ()
#3 0x000000000040d79e in cudart::cudaApiMalloc(void**, unsigned long) ()
#4 0x00000000004409cf in cudaMalloc ()
#5 0x000000000040362a in gpu_load_records(input_to_gpu_interface*) ()
#6 0x0000000000402bff in load_records(input_to_gpu_interface*) ()
#7 0x0000000000402783 in main ()
A debugging session is active.

    Inferior 1 [process 4020] will be killed.

Quit anyway? (y or n)

Probably it would be best if you provide a short, complete sample code that demonstrates the problem.

I’m suspicious about your ig pointer:

cudaMalloc ((void **) &(ig -> gpu_nodes),

if ig is NULL or otherwise not properly set up, dereferencing that pointer to write into the gpu_nodes field will cause a seg fault.