im having a small but critical problem on one program i did.
its using zero copy and it works under fermi.
however, when i plugged in the 9800gtx, im getting this error
src/myCudaCalls.cu(302) : cudaSafeCall() Runtime API error : invalid argument.
and the line is this one:
cutilSafeCall(cudaSetDeviceFlags(cudaDeviceMapHost));
cutilSafeCall(cudaHostAlloc((void **)&h_listo, sizeof(int), cudaHostAllocMapped)); --> This is line 302
cutilSafeCall(cudaHostGetDevicePointer((void **)&d_listo, (void *)h_listo, 0));
what could it be??
im compiling with -arch sm_11 and from what i know this gpu supports zero copy.
also, the sdk example simpleZerocopy uses exactly the same instructions, but it works, even when i manually compiled it with the same options.
Thanks for the information, for one for one moment i thought that too, but the fact that the example from SDK “simpleZeroCopy” worked with the same lines of code made me re-think of it, at least i have a contradiction on my situation and i really dont have an answer to this
In this post you posted the output of deviceQuery for your card. Look at the output line “Support host page-locked memory mapping”. It says No. This corresponds to the device property canMapHostMemory. Quoting from the pinned memory API documentation (§3.1):
and further on in the FAQ section of same document is says (§4.1):
Like I said, your card doesn’t support zero copy memory.
I am having sort of the same problem here but the device properties output the following:
Device 0: “GeForce 8400M GS”
CUDA Driver Version: 4.10
CUDA Runtime Version: 4.10
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 133496832 bytes
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 0.80 GHz
Concurrent copy and execution: Yes
of Asynchronous Copy Engines: 1
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads can use this device simultaneously)
Concurrent kernel execution: No
Device has ECC support enabled: No
Device is using TCC driver mode: No
The line that is causing the problem is the following
CUDA_SAFE_CALL(cudaHostAlloc((void**)&(h_pGMMRet->h_pinnedIn), h_pGMMRet->nInputImgSize, cudaHostAllocMapped));
Any help?
Do you see anything wrong with the code?