cudaMallocPitch failure to allocate ... on C2070

Any idea of what I am doing wrong?

I tried to search for cudaMemAllocPitch problems in the forums, but could not find anything useful.

Is there any known limitation? … or obvious bug in my code?

memfill.cu (3.89 KB)

The attached code is quite simple: it just tries to allocate as many 2d-arrays as possible.

It seems to work just fine on my laptop (GeForce GT 220M, drivers 280.26, win7x86), but fails systematically on C2070s (“out of memory” with about half the memory still free!)

The attached small code produces the following output on C2070 (drivers 275.33, win7x64):

+--------------------------------+

 Version driver(4000), runtime(4000)

+--------------------------------+

+==============

  test memfill: gpu0 mem free 6080991232 bytes

+==============

 [1500x1500] 2d-arrays, 9216000 bytes (pad=512)

 => 659 2d-arrays will occupy 6073344000 bytes

    est. free mem remaining: 7647232

  gpu0 alloc array #0 (stride=6144, 9216000 bytes)

    mem free: 6071685120 bytes, 9306112 bytes used (90112 over used, 90112 total over)

  gpu0 alloc array #1 (stride=6144, 9216000 bytes)

    mem free: 6062379008 bytes, 9306112 bytes used (90112 over used, 180224 total over)

....

  gpu0 alloc array #317 (stride=6144, 9216000 bytes)

    mem free: 3121647616 bytes, 9306112 bytes used (90112 over used, 28655616 total over)

*****

 array #318 alloc failed: out of memory

*****

  gpu0 memory free: 3121647616 bytes

..allocated 318 arrays

[font=“Verdana”]What is really strange is that it fails after the same number of allocated elements regardless of the available memory (gpu#2 is connected to a display):[/font]

..using gpu2

+--------------------------------+

 Version driver(4000), runtime(4000)

+--------------------------------+

+==============

  test memfill: gpu2 mem free 5411921920 bytes

+==============

 [1500x1500] 2d-arrays, 9216000 bytes (pad=512)

 => 587 2d-arrays will occupy 5409792000 bytes

    est. free mem remaining: 2129920

  gpu2 alloc array #0 (stride=6144, 9216000 bytes)

    mem free: 5402615808 bytes, 9306112 bytes used (90112 over used, 90112 total over)

  gpu2 alloc array #1 (stride=6144, 9216000 bytes)

    mem free: 5393309696 bytes, 9306112 bytes used (90112 over used, 180224 total over)

...

  gpu2 alloc array #317 (stride=6144, 9216000 bytes)

    mem free: 2452578304 bytes, 9306112 bytes used (90112 over used, 28655616 total over)

*****

 array #318 alloc failed: out of memory

*****

  gpu2 memory free: 2452578304 bytes..allocated 318 arrays

Can anyone reproduce/help/workaround/solve? (-- no more hair to pull here External Image)

[font=“Verdana”]

Thanks[/font]

Are you compiling for 64 bit?

Yes (for execution on XP64, win7x64), and also for 32 bits (XP,win7x32).

The cuda dlls I am currently linked against are cudart64_40_17.dll and cudart32_40_17.dll

According to the driver team the use of the TCC driver is strongly recommended for this card (the standard Windows driver being WDDM), so give that a try if you haven’t had the chance to do so yet.

just tried (276.14), same result.
any other idea? diagnosis?

Sorry, I don’t have any deeper insights into driver issues, that is outside my area of expertise. If you are already using the latest recommended TCC driver for this card, and you are still unable to allocate close to the total available memory, I would suggest filing a bug. There is a link off the registered developer website for this purpose.

I just realized that even with the 276.14 quadro/tesla drivers the “tccDriver” status reported in the cudaDeviceProp is still off.
anyway, the system I am having trouble has 4xC2070, and 2 of them have to be connected to a display … so running graphicless (which I think is what TCC mode requires (??)) is not really an option for me anyway.

I just filed a bug report, will see what happens

Thanks for the suggestions

got feedback from bugreport:

[/font]

[font=“Verdana”]I am quite surprised that nobody ran into this before … or is there any workaround I did not think of, in the meantime (before it gets resolved)?

[/font]