I’m a newbie to CUDA, went over the documentation and the examples and I’m about to implement some code using multi-threading.
My multi-threaded function needs to receive 5 variables:
var1 - sequential write-only float** array that will contain the result of the computation. This variable can get really large. 300-400 megabytes.
var2 - a 2-3megabyte float* vector. read only & sequential.
var3 - a 1-2megabyte float* vector. read only & sequential.
var4 - a small 1000-2000 byte float* vector. read only & sequential.
var5 - a small scalar, 8 bytes. read only.
These variables need to be accessed by all the threads (I will need something in the order of 100-150 threads, one block). All the threads write results into var1. Each thread will write into his own section of var1, no problem with collisions and conflicts there.
I understand that shared memory is limited to 16 KB, therefore I will not be able to store vars1-3 in this memory. Is that correct? Will it be possible to store var4-5 in shared memory and vars1-3 in global memory? Would that be the appropriate implementation?
That’s it for now… I’m sure I’ll have more questions once I’ll understand the basics here… :mellow: