cudaHostAlloc --> invalid argument it works with fermi, not with 9800gtx

hello,

im having a small but critical problem on one program i did.

its using zero copy and it works under fermi.

however, when i plugged in the 9800gtx, im getting this error

src/myCudaCalls.cu(302) : cudaSafeCall() Runtime API error : invalid argument.

and the line is this one:

cutilSafeCall(cudaSetDeviceFlags(cudaDeviceMapHost));

cutilSafeCall(cudaHostAlloc((void **)&h_listo, sizeof(int), cudaHostAllocMapped));    --> This is line 302

cutilSafeCall(cudaHostGetDevicePointer((void **)&d_listo, (void *)h_listo, 0));

what could it be??

im compiling with -arch sm_11 and from what i know this gpu supports zero copy.

also, the sdk example simpleZerocopy uses exactly the same instructions, but it works, even when i manually compiled it with the same options.

The 9800 GTX doesn’t support zero copy. The only compute 1.1 devices that support zero copy are the MCP79 family of integrated GPUs (9300M,9400M,Ion).

Thanks for the information, for one for one moment i thought that too, but the fact that the example from SDK “simpleZeroCopy” worked with the same lines of code made me re-think of it, at least i have a contradiction on my situation and i really dont have an answer to this

do you know what could be the trick behind?

In this post you posted the output of deviceQuery for your card. Look at the output line “Support host page-locked memory mapping”. It says No. This corresponds to the device property canMapHostMemory. Quoting from the pinned memory API documentation (§3.1):

and further on in the FAQ section of same document is says (§4.1):

Like I said, your card doesn’t support zero copy memory.

I am having sort of the same problem here but the device properties output the following:
Device 0: “GeForce 8400M GS”
CUDA Driver Version: 4.10
CUDA Runtime Version: 4.10
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 133496832 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 0.80 GHz
Concurrent copy and execution: Yes

of Asynchronous Copy Engines: 1

Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No

The line that is causing the problem is the following
CUDA_SAFE_CALL(cudaHostAlloc((void**)&(h_pGMMRet->h_pinnedIn), h_pGMMRet->nInputImgSize, cudaHostAllocMapped));

Any help?
Do you see anything wrong with the code?

Apostolis

I found what went wrong.
For some reason i had to do :
cutilSafeCall(cudaSetDeviceFlags(cudaDeviceMapHost));
before setting the device.

That fixed my problem.