Memory on the Nvidia device between kernel calls tends to retain state

Thanks, I have to be able to get the free device memory information or estimate it somehow.

There are many models of devices with different memory size and working on different computer

configurations. Knowing the free memory available can help my application to scale according to

different device/hardware configurations.

What do you recommend?

cuMemGetInfo will tell you how much memory is free, but not necessarily how much memory you can allocate in a maximum allocation due to memory fragmentation. It’s a driver API call but will work from the runtime API so long as a context has been created first.

Thanks,

could we assume that when starting a fresh CUDA application the memory is not fragmented?

Maybe on Vista, definitely not on other platforms.

Interesting. I assume this paging behavior is Vista/Win7 specific and not a CUDA driver feature on other platforms.

It’s a WDDM thing, yes.

As I see Windows 7 introduces new WDDM 1.1 drivers, not available for Vista.

What is different:

  1. GDI concurrency

Vista:

If App X gets hold of the lock, it can render to the screen while App Y is unable to do so and waits for App X to finish.

Win7

Added internal synchronization mechanism through which multiple applications can reliably render at the same time.

This could potentially give rise to timing issues such as deadlocks and rendering corruption.

  1. GDI memory usage

Vista

Every GDI application window accounts for two memory allocations which hold identical content – one in video memory and one in system memory.

The amount of memory required to run multiple windows scales linearly with the number of windows opened on the system.

It gets worse with more windows, higher resolution, multiple monitors - which means more paging activity.

Win7

The video memory consumed is cut in half by eliminating the copy of the system memory. This could lead to decreased paging activity.

To compensate for eliminating the copy of system memory in video memory Win7 is accelerating the common GDI operations through

the graphics hardware - the WDDM drivers accelerate these to minimize the performance impact of the CPU read-back of video memory.

Without these operations the CPU would incur a heavy performance penalty.

Meaning - now the GPU is involved much more in Win7 GDI operations.

The changes in Win7 basically are two:

  1. GPU scheduling

  2. GPU memory management

These changes possibly enable better performance in certain scenarios.

Related to CUDA applications:

  1. It is not clear how this change will impact our own CUDA applications because it also involves GPU.

My GPU application will compete with Win7 GPU usage. Possibly it would be good idea to can raise the priority of our

CUDA application temporarily, get the results and then return back the old priority.

  1. Is good because there will be more GPU memory available.