I have a 2D matrix. Each element is complex float.
Row is consecutive in memory.
Each row should be multiplied by a const vector sample by sample. (.* in Matlab)
In TX2 there is ~50KB of shared memory per block.
Can I launch the kernel and pass as parameter a complex float vector that will be copied to this shared memory ?
Currently this vector is located in global memory.
You can. Not sure it would be a good idea as if you use all/most of the shared memory your occupancy can go down too much.
If all threads use the same vector (items in the vector) copying to constant memory would probably be better. Otherwise try to use __ldg. Anyway test which works best for you.
Thank you for your reply.
Is it possible to copy data to constant memory before running the kernel ?
Yup, you should copy to the constant memory before running the kernel.
Use the cudaMemcpyToSymbol API.