I ran into the same problems on a 64 Byt-System (AMD Opteron): I converted some object oriented sample code (original from T.B. in his [post=“491467”]post[/post] - Thank you very much T.B. for this great example!) from runtime API to driver API. Trying to launch the following global function resulted in CUDA_ERROR_LAUNCH_FAILED. This error occured as soon as I tried to acces an element of [font=“Courier New”]result[/font] (even without calling the method [font=“Courier New”]Get()[/font]):
[codebox]
int align(int offset, int alignment) {
return ((offset + alignment - 1) / alignment) * alignment;
}
…
/* test <<< 1, N >>> (device_data, device_result); */
cuFuncSetBlockShape(cuFunction, N, 1, 1);
offset = 0;
cuParamSetv(cuFunction, offset, &device_data, sizeof(void*));
offset = align(offset + sizeof(void*), __alignof(void*));
//before: cuParamSetv(cuFunction, offset, &device_data, sizeof(device_data));
//before: offset = align(offset + sizeof(device_data), __alignof(device_result));
cuParamSetv(cuFunction, offset, &device_result, sizeof(void*));
offset += sizeof(void*);
//before: cuParamSetv(cuFunction, offset, &device_result, sizeof(device_result));
//before: offset += sizeof(device_result);
cuParamSetSize(cuFunction, offset);
cuLaunch(cuFunction);
…
[/codebox]
I post this in response to MichaelChampigny’s last [post=“488889”]post[/post] and in order to receive feedback about the quality of this code especially regarding its expected future compatibility.
Additionally, I’d like to mention a few related issues / questions which I would be glad if they could be fixed / answered in a pleasing way, at least with CUDA 2.1:
[list=1]
[*] Passing paramters in the driver API: Passing parameters (especially pointers) this way is not very intuitive: Given that a CUdeviceptr is 4 Bytes long, one would expect that pointers in global and device functions were also of this size. At least for pointers as parameters of global functions this does not seem to be the case. Is there a reason why it should not work as described by the lines commented out with "//before: " in the above code snippet?
[] Probably related, sizeof(void) used within a global function returns 8 (on the Opteron system) and not as one might expect 4 (like for sizeof(CUdeviceptr)). At least when calling nvcc with -cubin, there was an opportunity to switch to the corresponding sizeof values for the GPU…
[*] Are GPU-pointers really 8 Bytes long (as supposed by MichaelChampigny in his last [post=“488889”]post[/post]) or are they only 4 Bytes long?
[*] I didn’t check lately but I’m not aware of any graphic card with more than 4 GB of memory per GPU. Thus, 4 byte pointers were enough, currently. On the other hand, if one wanted to encourage porting larger applications to CUDA, larger memories and thus larger pointers would be desirable…