cudaMalloc segfaulting Possible cause?

Tigga · September 19, 2008, 10:20am

About 6 hours into my run I’m getting a problem with cudaMalloc segfaulting (under linux) or causing very odd behavior (under windows). It happens on the largest memory allocation that I have in my program.

I’m fairly sure I must be doing something wrong somewhere, but I can’t work out what I’m doing wrong. I’m just trying to rule out possibilities.

Am I right in thinking this can’t be caused by the GPU? I would expect it to return an error code rather than segfaulting if the malloc fails. No previous calls return error codes either.

If it’s not the GPU it’s on the host side. I’ve noticed a few memory leaks reported by valgrind in libcudart.so (one on cudaMalloc… though not the one that segfaults, and one on cudaGetDeviceCount), however upgrading my memory from 3GB to 4GB didn’t change the behavior of the program at all. I can only think that something on the host is overflowing and overwriting something which doesn’t like being overwritten.

Given the 6 hour in nature of the problem it’s painfully hard to work with. Any thoughts?

mfatica · September 19, 2008, 1:14pm

It is a bug in the driver, a fix will be available shortly.

Tigga · September 25, 2008, 3:22pm

Using the latest windows driver (178.13) I am still seeing exactly the same behaviour. I’ve just set off a test to see if it’s giving any error codes this time (I accidently ran it without my error checking).

Just a bit of background to my program wrt memory allocation - it allocates ever increasing amounts of memory at the start of an iteration in several allocations most of which are ~1-8 megs, some of which are ~60 megs and then one ~140 megs and one ~230 megs. At the end of each iteration it frees the memory, and the next iteration it allocates slightly more. It dies on the 286th iteration.

cuMemGetInfo reports that I’m using ~83% of the overall memory (used: ~745 megs, free ~150 megs - it’s a 892 meg card).

I’m still not completely ruling out an error on my part, however cudaMalloc segfaulting on linux certainly seemed bad… and I’m getting the same behaviour with the new windows driver as the behaviour that corresponded with the segfault previously, at around about the same point (both runs have one particular memory allocation just over ~150 megs… worringly close to the amount of free memory, but I think unrelated - it still dies on runs with slightly more free memory). This all leads me to think that it’s somehow related to memory fragmentation on the card (GTX 260) not leaving space for the big chunk of memory that needs to be allocated.

Anyways - can somebody confirm that the bug mentioned is fixed in the latest driver please?

EDIT: Running a test overnight with full error checking and some fairly mean input checks. I’ll be back in about 16 hours!

alex_dubinsky · September 25, 2008, 6:25pm

I don’t think anyone said it was fixed already.

Tigga · September 25, 2008, 8:32pm

Well a fix ‘soon’ followed about a week later by a new driver kinda implied it.

mfatica · September 26, 2008, 12:22am

Sorry, but the fixes (cudaMalloc and watchdog) are not in 178.13.

Tigga · September 26, 2008, 8:46am

Ok. Thanks for the info!

kristleifur · September 26, 2008, 10:34am

Ouch! Godspeed getting the new driver out.

Topic		Replies	Views
cudaMalloc() leads to segment fault Jetson TX1	9	4701	June 30, 2017
memory fragmentation? CUDA Programming and Performance	2	4316	April 15, 2009
Segmentation Fault on cudaMalloc CUDA Programming and Performance	6	3832	March 28, 2010
cudaFree, segmentation fault CUDA Programming and Performance	4	3693	July 29, 2009
Problem with cudaMalloc CUDA Programming and Performance	4	10163	October 29, 2008
Memory allocation reliablity CUDA Programming and Performance	8	3294	August 18, 2008
host malloc segmentation fault CUDA Programming and Performance	2	822	January 9, 2015
cudaMalloc error CUDA Programming and Performance	0	7309	March 16, 2010
Segmentation Fault on calling cudaMalloc - I can't figure out why CUDA Programming and Performance	1	2094	November 12, 2015
How to solve memory allocation problem in cuda?? CUDA Programming and Performance	4	31365	February 2, 2015

cudaMalloc segfaulting Possible cause?

Related topics