amount of pinned memory

Can anybody say me what maximum amount of pinned memory (cudaMallocHost) can I allocate ? It seems that under XP-32 I can alloacte necessary for me 150 Mb while under Vista-64 can’t. Can this be ?

This depends on your host machine, so I don’t think anybody can say anything useful about this.

See here: http://msdn.microsoft.com/en-us/library/aa366778.aspx

In Vista x64, this is apparently 40% of RAM by default. I don’t know why you can’t allocate even 150MB, but there are various registry and security settings that restrict this. (You can crash a system by allocating pinned memory!) Search around for the term Microsoft uses, “non-paged memory” or “non-paged pool.”

I think it highly depends on your OS. On Linux I have successfully allocated more than 15 GByte of pinned memory leaving only around 300 MByte of physical memory for the OS. Now, you should be carefull about not triggering the out of memory killer if the system is out of your physical reach …

I went to 7.7 of 8GB on my Linux64 box. Then it got upset and started killing applications. :(

And you can do this as an unprivileged user?

Must be for this program “Run as Administrator” or enough usual run ?

On Linux, Yes!

See… an unprivileged user can use cudaMallocHost() to crash programs on Linux.

Vista does what it does for a reason. Score Microsoft: 1

I don’t know. There are probably different security policies for administrators and limited users. Have you tried it as an administrator?

As far as I could tell it will only kill programs the current user is running.

I’d second that.

Uh, this has not that much to do with Vista or Microsoft. On Linux, you can specify how much pinned memory to allow with -ulimit.

But the NVidia graphics driver runs with highest priviledges and can of course ignore that and do stuff that

makes it unsuitable for a multiuser-system, but that works for Vista just as well except that Microsoft might not

sign your driver (and thus not allow it to run at such high priviledge levels) if they know about it.

Feel free to disable module loading support in Linux if you want the equivalent of only being able to run “approved” drivers :P .

And about the Linux OOM killer: it might kill anything, no matter which user owns it, but it tries to find the “best” process to kill (which usually means some large non-root process which has run only a short time).

In my experience most times the sshd process of the user that started the triggering application. ;)

Made simple test. I have 2 Gb RAM, now 1.16 is free. Vista-64, Service Pack 1

int
main(int argc, char** argv)
{
unsigned char h_idata = NULL;
CUDA_SAFE_CALL( cudaMallocHost( (void
*)&h_idata, 256 * (1<<20) ) );
}

Tests fails. For 128 it works.

Can anybody explain what is wrong ?

I’ve tried your code. Same behavior. However, there is apparently more to it. The first call to cudaMallocHost can only allocate 128MB. The second call can allocate up to 256MB, and you can keep allocating 256MB pieces until there’s no more free ram. Didn’t try to crash any programs, but i’m sure a few would eventually. Fun fact: according to Task Manager, physical RAM doesn’t get allocated until you touch it.

[codebox]

#include “cutil.h”

#include <stdlib.h>

#include <stdio.h>

#include <cuda_runtime.h>

int

main(int argc, char** argv)

{

unsigned *h_idata1, *h_idata2, *h_idata3;

printf(“Press return…”);getchar();

CUDA_SAFE_CALL( cudaMallocHost( (void**)&h_idata1, 128 * (1<<20) ) );

for(int i= 0; i< 128*(1<<20)/sizeof(unsigned); i++)

h_idata1[i] = i;

for(int i= 0; i< 128*(1<<20)/sizeof(unsigned); i++)

if(h_idata1[i] != i) printf("DATA MISMATCH\n");

printf(“Press return…”);getchar();

CUDA_SAFE_CALL( cudaMallocHost( (void**)&h_idata2, 256 * (1<<20) ) );

for(int i= 0; i< 256*(1<<20)/sizeof(unsigned); i++)

h_idata2[i] = i;

for(int i= 0; i< 256*(1<<20)/sizeof(unsigned); i++)

if(h_idata2[i] != i) printf("DATA MISMATCH\n");

do{

printf("Press return...");getchar();

CUDA_SAFE_CALL( cudaMallocHost( (void**)&h_idata3, 256 * (1<<20) ) );

for(int i= 0; i< 256*(1<<20)/sizeof(unsigned); i++)

	h_idata3[i] = i;

for(int i= 0; i< 256*(1<<20)/sizeof(unsigned); i++)

	if(h_idata3[i] != i) printf("DATA MISMATCH\n");

}while(1);

printf(“Press return…”);getchar();

}

[/codebox]

Unfortuantely your solution doesn’t help. Yes, I have no more errors with cudaMallocHost but now cudaMemcpy fails.

[codebox]

#include “cutil.h”

#include <stdlib.h>

#include <stdio.h>

#include <cuda_runtime.h>

int main(int argc, char** argv)

{

#define SIZE 128*(1<<20)

#define SIZE2 128*(1<<20)

unsigned *h_idata1, *h_idata2, *h_idata3;

unsigned *d_idata1, *d_idata2, *d_idata3;

printf("Press return...");getchar();

CUDA_SAFE_CALL( cudaMallocHost( (void**)&h_idata1, SIZE ) );

CUDA_SAFE_CALL( cudaMalloc( (void**)&d_idata1, SIZE ) );

for(int i= 0; i< SIZE/sizeof(unsigned); i++)	

	h_idata1[i] = i;

CUDA_SAFE_CALL( cudaMemcpy(d_idata1, h_idata1, SIZE , cudaMemcpyHostToDevice) );

for(int i= 0; i< SIZE/sizeof(unsigned); i++)	

	if(h_idata1[i] != i) printf("DATA MISMATCH\n");

printf("Press return...");

getchar();

CUDA_SAFE_CALL( cudaMallocHost( (void**)&h_idata2, SIZE2 ) );

CUDA_SAFE_CALL( cudaMalloc( (void**)&d_idata2, SIZE2 ) );

for(int i= 0; i< SIZE2/sizeof(unsigned); i++)	

	h_idata2[i] = i;

CUDA_SAFE_CALL( cudaMemcpy(d_idata2, h_idata2, SIZE2 , cudaMemcpyHostToDevice) );

for(int i= 0; i< SIZE2/sizeof(unsigned); i++)	

	if(h_idata2[i] != i) 

		printf("DATA MISMATCH\n");

do

{	

	printf("Press return...");getchar();	

	CUDA_SAFE_CALL( cudaMallocHost( (void**)&h_idata3, SIZE2 ) );	

	CUDA_SAFE_CALL( cudaMalloc( (void**)&d_idata3, SIZE2 ) );

	for(int i= 0; i< SIZE2/sizeof(unsigned); i++)		

		h_idata3[i] = i;	

	CUDA_SAFE_CALL( cudaMemcpy(d_idata3, h_idata3, SIZE2 , cudaMemcpyHostToDevice) );

	for(int i= 0; i< SIZE2/sizeof(unsigned); i++)		

		if(h_idata3[i] != i) printf("DATA MISMATCH\n");

}

while(1);

printf("Press return...");

getchar();

}

[/codebox]

It’s true. By why would cudaMemcpy() return “out of memory”?

It’s about time NVIDIA took a look at it.