Hi guys,
I have a code which does some manipulations on some 3D arrays (example: array[i][j][k])
I now want to allocate theses arrays on the device and do manipulations on it to parallerize the process thus to decrease the time.
Does anyone know how am I suppose to perform this?
Hi guys,
I have a code which does some manipulations on some 3D arrays (example: array[i][j][k])
I now want to allocate theses arrays on the device and do manipulations on it to parallerize the process thus to decrease the time.
Does anyone know how am I suppose to perform this?
I shall interpret your question especially the word “allocate” in a more library form… I think you mean how to distribute the work load. Let me know if my interpretation of your question is wrong and you actually want to know how to “allocate”. The answer to that is probably use malloc and such.
Anyway the basic idea to distribute that workload is to do the following:
Array[BlockIdx.X][BlockIdx.Y][Thread.X]
or in case width is higher than threads extra blocks need to be used for width:
I shall interpret your question especially the word “allocate” in a more library form… I think you mean how to distribute the work load. Let me know if my interpretation of your question is wrong and you actually want to know how to “allocate”. The answer to that is probably use malloc and such.
Anyway the basic idea to distribute that workload is to do the following:
Array[BlockIdx.X][BlockIdx.Y][Thread.X]
or in case width is higher than threads extra blocks need to be used for width:
Thank you very much Skybuck, I am not that far yet, my situation is that, I now have a 3D array example:array[z][y] in the host memory, I want to copy these data to the device memory in a 3D form, I tried using the CUDAMalloc3DArray and cudaMemcpy3D, but things seems not really working, do you have any thoughts?
Thank you very much Skybuck, I am not that far yet, my situation is that, I now have a 3D array example:array[z][y] in the host memory, I want to copy these data to the device memory in a 3D form, I tried using the CUDAMalloc3DArray and cudaMemcpy3D, but things seems not really working, do you have any thoughts?